Skip to content

Conversation

eshitachandwani
Copy link
Member

This PR moves the LDS and RDS watchers to dependency manager without chaning the current functionality or behaviour. This is a part of implementation of gRFC A74.

RELEASE NOTES: None

@eshitachandwani eshitachandwani added this to the 1.77 Release milestone Oct 14, 2025
@eshitachandwani eshitachandwani added Type: Internal Cleanup Refactors, etc Area: xDS Includes everything xDS related, including LB policies used with xDS. labels Oct 14, 2025
Copy link

codecov bot commented Oct 15, 2025

Codecov Report

❌ Patch coverage is 71.27072% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.86%. Comparing base (ae62635) to head (ed60964).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...xds/xdsdependencymanager/xds_dependency_manager.go 70.64% 23 Missing and 9 partials ⚠️
internal/xds/xdsclient/xdsresource/xdsconfig.go 0.00% 9 Missing ⚠️
internal/grpctest/tlogger.go 69.23% 6 Missing and 2 partials ⚠️
internal/xds/xdsdependencymanager/watch_service.go 91.42% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8651      +/-   ##
==========================================
+ Coverage   81.21%   81.86%   +0.65%     
==========================================
  Files         416      420       +4     
  Lines       41002    40969      -33     
==========================================
+ Hits        33298    33540     +242     
+ Misses       6226     6050     -176     
+ Partials     1478     1379      -99     
Files with missing lines Coverage Δ
internal/xds/xdsdependencymanager/logging.go 100.00% <100.00%> (ø)
internal/xds/xdsdependencymanager/watch_service.go 91.42% <91.42%> (ø)
internal/grpctest/tlogger.go 70.48% <69.23%> (-0.45%) ⬇️
internal/xds/xdsclient/xdsresource/xdsconfig.go 0.00% <0.00%> (ø)
...xds/xdsdependencymanager/xds_dependency_manager.go 70.64% <70.64%> (ø)

... and 33 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@easwars
Copy link
Contributor

easwars commented Oct 15, 2025

The tests are failing. Is this ready for review?

Comment on lines +22 to +23
// XDSConfig holds the complete and resolved xDS resource configuration
// including LDS, RDS, CDS and endpoints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// XDSConfig holds the complete and resolved xDS resource configuration
// including LDS, RDS, CDS and endpoints.
// XDSConfig holds the complete gRPC client-side xDS configuration
// containing all necessary resources.

// including LDS, RDS, CDS and endpoints.
type XDSConfig struct {
// Listener is the listener resource update
Listener ListenerUpdate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are moving the ResourceChanged methods on the resource watchers to accept a pointer to the update struct (instead of accepting the update by value). So, I think it would make sense for us to store them as pointers here as well.

See: #8652

// XDSConfig holds the complete and resolved xDS resource configuration
// including LDS, RDS, CDS and endpoints.
type XDSConfig struct {
// Listener is the listener resource update
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Let's try to consistently use the word configuration or config instead of update in these docstrings.

So, maybe something like:
// Listener holds the listener configuration.

Comment on lines +28 to +29
// RouteConfig is the route configuration resource update. It will be
// populated even if RouteConfig is inlined into the Listener resource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// RouteConfig is the route configuration resource update. It will be
// populated even if RouteConfig is inlined into the Listener resource.
// RouteConfig holds the route configuration. It will be
// populated even if the route configuration was inlined into the Listener resource.

Comment on lines +32 to +33
// VirtualHost is the virtual host from the route configuration matched with
// dataplane authority .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe?

Suggested change
// VirtualHost is the virtual host from the route configuration matched with
// dataplane authority .
// VirtualHost selected from the route configuration whose domain field
// offers the best match against the provided dataplane authority.

Comment on lines +36 to +42
// Clusters maps the cluster name with the ClusterResult which will have
// either the cluster configuration or error. It will have an error status
// if either
//
// (a) there was an error and we did not already have a valid resource or
//
// (b) the resource does not exist.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making it much simpler and leaving more of the documentation to the individual structs.

// Clusters is a map from cluster name to its configuration.

Clusters map[string]*ClusterResult
}

// ClusterResult contains either a cluster's configuration or an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like?

// ClusterResult contains a cluster's configuration when we receive a 
// valid resource from the management server. It contains an error when:
// - we receive an invalid resource from the management server and
//   we did not already have a valid resource or
// - the cluster resource does not exist on the management server

Err error
}

// ClusterConfig contains cluster configuration for a single cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ClusterConfig contains configuration for a single cluster.

// ClusterConfig contains cluster configuration for a single cluster.
type ClusterConfig struct {
Cluster ClusterUpdate // Cluster configuration. Always present.
EndpointConfig EndpointConfig // Endpoint configuration for leaf clusters which will of type EDS or DNS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just "Endpoint configuration for leaf clusters" should suffice.

AggregateConfig AggregateConfig // List of children for aggregate clusters.
}

// AggregateConfig contains a list of leaf cluster names.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need not technically be all leaf clusters. Aggregate clusters can have children that are aggregate clusters as well.

LeafClusters []string
}

// EndpointConfig contains resolved endpoints for a leaf cluster either from DNS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this contains more than just resolved endpoints, at least for the EDS case. So, maybe the comment can be more generic.

// EndpointConfig contains configuration corresponding to the endpoints in a cluster.

And we should also clarify that only one of three fields can be populated at any given point in time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it the case that ResolutionNote can have a non-nil error even when one of EDSUpdate or DNSEndpoints is set? If so, we need to clarify that.

// including LDS, RDS, CDS and EDS and sends update once we have all the
// resources and sends an error when we get error in listener or route
// resources.
func New(listenername, dataplaneAuthority string, xdsClient xdsclient.XDSClient, watcher ConfigWatcher) *DependencyManager {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/listenername/listenerName

// resources and sends an error when we get error in listener or route
// resources.
func New(listenername, dataplaneAuthority string, xdsClient xdsclient.XDSClient, watcher ConfigWatcher) *DependencyManager {
// Builds the dependency manager and starts the listener watch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: nix this comment as this is very obvious that you are creating the struct here. And the listener watch is not started here though.

Comment on lines +110 to +112
// ConfigWatcher is notified of the XDSConfig resource updates and errors that
// are received by the xDS client from the management server. It only receives a
// XDSConfig update after all the xds resources have been received.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

// ConfigWatcher is the interface for consumers of aggregated xDS configuration
// from the DependencyManager. The only consumer of this configuration is
// currently the xDS resolver.

// ConfigWatcher is notified of the XDSConfig resource updates and errors that
// are received by the xDS client from the management server. It only receives a
// XDSConfig update after all the xds resources have been received.
type ConfigWatcher interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider moving this to the top of the file so that the methods of the DependencyManager stay together and are not mixed in with another type's definition.


func (m *DependencyManager) maybeSendUpdate() {
if m.logger.V(2) {
m.logger.Infof("Sending update to watcher: Listener: %v, RouteConfig: %v", pretty.ToJSON(m.currentListenerUpdate), pretty.ToJSON(m.currentRouteConfig))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've had many performance problems with using pretty.JSON for printing structs. I would recommend using %+v or some other native formatting directive instead.

Another thing to consider is also whether the xDS resolver also outputs this log. If so, we don't want the same information being repeated twice.

Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the tests yet. But I guess these comments will give you enough to make progress.

type ConfigWatcher interface {
// OnUpdate is invoked by the dependency manager to provide a new,
// validated xDS configuration to the watcher.
OnUpdate(xdsresource.XDSConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we are changing the resource watcher APIs to accept pointers to resource update structs, it would make more sense to store pointers to them in the XDSConfig struct as well.

And continuing in that same vein, we could return a pointer to the XDSConfig struct from here. Also, it would make sense to document that the watcher must not modify the returned XDSConfig and that it should read-only for the watcher.

// OnError is invoked when an error is received in listener or route
// resource. This includes cases where:
// - The listener or route resource watcher reports a resource error.
// - The received listener resource is a socket listener, not an API listener - TODO : This is not yet implemented, tracked here #8114
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could generalize this and specify that any resource validations performed at the DependencyManager that fail, also lead to OnError being invoked on the watcher.

OnError(error)
}

func (m *DependencyManager) maybeSendUpdate() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called maybeSendUpdate? Under what conditions will it not send an update? Can this be captured in its docstring.

// Only executed in the context of a serializer callback.
func (m *DependencyManager) onListenerResourceUpdate(update *xdsresource.ListenerUpdate) {
if m.logger.V(2) {
m.logger.Infof("Received update for Listener resource %q: %v", m.ldsResourceName, pretty.ToJSON(update))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all these usages of pretty.JSON, please consider switching them to native formatting directives. Experiment with a few of them like %v, %+v, %#v, %+V, %#V and see which one provides the best output and use that.

}

func (m *DependencyManager) applyRouteConfigUpdate(update xdsresource.RouteConfigUpdate) {
matchVh := xdsresource.FindBestMatchingVirtualHost(m.dataplaneAuthority, update.VirtualHosts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/matchVh/matchVH to comply with Go initialisms.

// Only executed in the context of a serializer callback.
func (m *DependencyManager) onListenerResourceError(err error) {
if m.logger.V(2) {
m.logger.Infof("Received resource error for Listener resource %q: %v", m.ldsResourceName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have some code in the xDS client to ensure that the returned errors contain the xDS node ID. Could you please ensure that that property still holds. Thanks.

Comment on lines +160 to +164
m.rdsResourceName = ""
if m.routeConfigWatcher != nil {
m.routeConfigWatcher.stop()
m.routeConfigWatcher = nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have to set the matching virtual host to nil here?

It would be nice if we have a method to do all the cleanup when a listener resource error or a listener resource update invalidates the previously received route config. I see similar code in onListenerResourceError, but that one sets the matching virtual host to nil as well.

m.rdsResourceName = ""
m.currentVirtualHost = nil
m.routeConfigWatcher = nil
m.watcher.OnError(status.Errorf(codes.Unavailable, "Listener resource error : %v", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, if the watcher is going to be given status errors, we need to document that clearly along with what status codes are returned when. And why?

Comment on lines +218 to +221
if m.rdsResourceName != resourceName {
// Drop updates from canceled watchers.
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we are going need code like this for cluster and endpoint watchers as well. Can we make this part of the watcher instead?

Comment on lines +231 to +234
//If update is not for the current watcher
if m.rdsResourceName != resourceName {
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Internal Cleanup Refactors, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants