Skip to content

Conversation

dcaputo-harmoni
Copy link

Problem

The ClickHouse operator generates incorrect hostnames in remote_servers.xml configuration, causing DNS resolution failures and breaking distributed operations in multi-node clustered deployments (sharded and/or replicated setups).

Issue Details:

  • Generated hostnames: chi-db-clickhouse-db-{shard}-{replica} (e.g., chi-db-clickhouse-db-0-0)
  • Actual pod names: chi-db-clickhouse-db-{shard}-{replica}-{ordinal} (e.g., chi-db-clickhouse-db-0-0-0)

This mismatch causes all nodes to show is_local=0 in system.clusters, breaking distributed operations and ON CLUSTER commands.

Root Cause

The operator was not following Kubernetes StatefulSet DNS naming conventions. StatefulSets use a specific DNS pattern:

<pod-name>.<headless-service-name>.<namespace>.svc.cluster.local

The createPodFQDN function was incorrectly using createPodHostname() (service name) instead of createPodName() (actual pod name with -0 ordinal suffix). While the service name would work for network connectivity, ClickHouse's is_local detection requires the hostname in remote_servers.xml to exactly match the pod's actual hostname for proper cluster node identification.

Solution: Fixed Both Hostname Generation Functions

Modified both createPodHostname and createPodFQDN functions in CHI and CHK namers:

1. Fixed createPodHostname()

Before (broken): Returned service name without ordinal

// Old logic - returned service name, not pod name
return n.createStatefulSetServiceName(host)  // chi-clickhouse-clickhouse-0-0

After (fixed): Returns actual pod name with ordinal

// New logic - returns actual pod name that matches StatefulSet
return n.createPodName(host)  // chi-clickhouse-clickhouse-0-0-0

2. Fixed createPodFQDN()

Before (broken): Used service name in FQDN

// Old logic - used hostname (service name) in FQDN
return fmt.Sprintf("pattern", n.createPodHostname(host), ...)

After (fixed): Uses proper StatefulSet DNS pattern

// New logic - proper StatefulSet DNS pattern
// <pod-name>.<headless-service-name>.<namespace>.svc.cluster.local
return fmt.Sprintf(
    "%s.%s.%s.svc.cluster.local",
    n.createPodName(host),                    // chi-clickhouse-clickhouse-0-0-0
    n.createStatefulSetServiceName(host),     // chi-clickhouse-clickhouse-0-0  
    host.GetRuntime().GetAddress().GetNamespace(),
)

This ensures both functions return pod names that match actual StatefulSet pod hostnames, enabling proper is_local detection and DNS resolution.

Files Changed:

  • pkg/model/chi/namer/name.go - Implemented proper StatefulSet DNS pattern for CHI
  • pkg/model/chk/namer/name.go - Implemented proper StatefulSet DNS pattern for CHK

Compatibility with namespaceDomainPattern

This fix is fully compatible with the existing namespaceDomainPattern functionality. When users specify a custom domain pattern like:

spec:
  namespaceDomainPattern: "%s.svc.my.test"

The implementation properly handles both cases:

  • Default: <pod-name>.<headless-service-name>.<namespace>.svc.cluster.local
  • Custom domain: <pod-name>.<headless-service-name>.<custom-domain-pattern>

The %s placeholder in namespaceDomainPattern gets replaced with the namespace name, maintaining full backward compatibility while fixing the underlying DNS resolution issues.

Impact

  • Fixes DNS resolution: Proper StatefulSet DNS patterns resolve correctly
  • Fixes ON CLUSTER operations: Distributed DDL now works in sharded configurations
  • Resolves is_local=0 issue: All cluster nodes correctly identify themselves as local
  • Eliminates manual workarounds: No need for custom extraConfig remote_servers overrides
  • Stops operator log spam: Eliminates "The host X-X is outside of the cluster" messages
  • Fixes cluster detection logic: Operator now correctly identifies hosts within clusters
  • Maintains full backward compatibility: Works with existing configurations
  • Supports custom domains: Compatible with namespaceDomainPattern overrides
  • Works for both CHI and CHK: ClickHouse and ClickHouse Keeper deployments
  • Production tested: Verified with real distributed queries and inter-node communication

Operator Log Messages Explained

This fix resolves continuous operator log messages like:

The host 0-1 is outside of the cluster

These occur because the operator's IsHostInCluster() function queries:

SELECT count() FROM system.clusters WHERE cluster='clickhouse' AND is_local

When hostnames mismatch, this always returns 0 (no local node found), causing the operator to repeatedly log that hosts are "outside" the cluster even when they're functioning correctly.

Testing & Validation

Production Tested:

  • ✅ Distributed table creation and queries across multiple shards
  • ON CLUSTER DDL operations on 4-node cluster (2 shards, 2 replicas each)
  • ✅ Inter-node data distribution and aggregation
  • ✅ Cluster-wide system table queries
  • ✅ Custom namespaceDomainPattern compatibility
  • ✅ Both secure and non-secure cluster configurations

Technical Details: StatefulSet DNS Pattern Implementation

The fix implements the standard Kubernetes StatefulSet DNS pattern by ensuring FQDNs follow:

<pod-name>.<headless-service-name>.<namespace>.svc.cluster.local

Key Components:

  • Pod Name: chi-clickhouse-clickhouse-0-0-0 (includes -0 ordinal)
  • Headless Service: chi-clickhouse-clickhouse-0-0 (StatefulSet service)
  • Domain: <namespace>.svc.cluster.local (or custom via namespaceDomainPattern)

This ensures proper DNS resolution for StatefulSet pods while maintaining compatibility with all existing cluster configurations and custom domain patterns.

Verification

After this fix, users will no longer need manual extraConfig overrides. The operator automatically generates correct hostnames in remote_servers.xml that match actual StatefulSet pod names and DNS patterns.

Example of corrected hostname generation:

  • Before (broken): chi-db-clickhouse-db-0-0.namespace.svc.cluster.local
  • After (working): chi-db-clickhouse-db-0-0-0.chi-db-clickhouse-db-0-0.namespace.svc.cluster.local

This change ensures ClickHouse can properly identify local replicas and enables all distributed operations to work correctly out of the box with proper StatefulSet DNS resolution.

@sunsingerus sunsingerus added research required This issue requires additional research planned for review This feature is planned for review labels Sep 30, 2025
@sunsingerus sunsingerus changed the base branch from master to 0.25.5 September 30, 2025 15:37
@alex-zaitsev
Copy link
Member

Thank you @dcaputo-harmoni for detailed analysis and fix. It seems to be correct. We are concerned, however, that we do not see problems you are referring to. So it might be something specific to your Kubernetes cluster network setup.

The fix may also break monitoring metrics (see failed test), since labels will likely change.

@dcaputo-harmoni
Copy link
Author

dcaputo-harmoni commented Oct 1, 2025

@alex-zaitsev Thanks for the feedback! The reason I submitted this change is that the current hostname logic doesn’t line up with what Kubernetes documents as the standard for StatefulSet pods behind a headless service. The canonical pattern is <pod-name>.<headless-service-name>.<namespace>.svc.cluster.local, and without the headless service component the pods don’t resolve reliably across clusters. In my setup the shorter form caused is_local mismatches because ClickHouse compared against the full FQDN, which meant the node treated all replicas as remote. Adding the headless service reference fixed that consistently, and should work in all clusters. As far as I can tell this isn’t a new Kubernetes behavior - it has always been the intended pattern - but some clusters (and perhaps yours) may have DNS search paths or custom CoreDNS rules that make the shorter form work, which would explain why it hasn’t been an issue everywhere.

I understand the concern about metrics labels breaking. To avoid disruption, we could introduce a useLegacyLabels (or similar) configuration setting. By default the operator would use the canonical FQDN form, but enabling the flag would preserve the old label format for clusters that depend on it. That way the change is opt-in and not silently breaking, while still giving a clear migration path toward the Kubernetes-standard convention. Let me know your thoughts.

@alex-zaitsev
Copy link
Member

@dcaputo-harmoni , makes sense. We always use CoreDNS, but I think it resolves short names by default.

Broken pod name label for monitoring is probably not a big deal. I am more concerned of an upgrade that will result in re-generation of remote_servers completely.

I've just realized that the real problem is in replicated tables. The replica name is stored in (Zoo)Keeper. And when changes -- ClickHouse will loose it. So at the very least we need to keep replica macro unchanged.

We need to do some more testing and decide if it safe to include in 0.25.5 or better to wait for a next major release.

As a side note -- it was always confusing to me that ClickHouse reports the different host name by itself, compared to what is in remote_servers (it did not stop it from resolving is_local properly though).

Thanks again.

@dcaputo-harmoni
Copy link
Author

@alex-zaitsev all sounds good to me, and yeah the hostname issue was a real head scratcher for me too. We are running these changes in our production environment now via a custom built image, and everything has been working well - but I'll continue to follow this PR, and just let me know if I can be of assistance in any way!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

planned for review This feature is planned for review research required This issue requires additional research

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants