Skip to content

Cloudflare provider: harden Custom Hostname authorization and cache plan lookups #6391

@AndrewCharlesHay

Description

@AndrewCharlesHay

Was reading through the Cloudflare provider for a security angle and hit two things worth flagging. They're in the same file set and share the same shape (ingress/service annotations influencing what the operator's Cloudflare account gets asked to do), so filing as one issue. Happy to split if you'd rather.

1. Custom Hostname annotation lets tenants request hostnames on the operator's account

Files: provider/cloudflare/cloudflare_custom_hostnames.go:230-236, provider/cloudflare/cloudflare.go:716-718

With --cloudflare-custom-hostnames on, anyone who can create an Ingress/Service/Gateway in a watched namespace can put

external-dns.alpha.kubernetes.io/cloudflare-custom-hostname: login.victim.example.com

on their resource and external-dns will call CustomHostnames.New() against the operator's Cloudflare account with that hostname. The custom_origin_server is the endpoint's DNSName, i.e. the tenant's own workload:

// newCustomHostname
return customHostname{
    hostname:           hostname,            // from the tenant's annotation
    customOriginServer: origin,              // ep.DNSName, also tenant-controlled
    ssl:                getCustomHostnamesSSLOptions(p.CustomHostnamesConfig),
}

On a multi-tenant cluster with Cloudflare for SaaS, that's a hostname-takeover primitive. The tenant doesn't need to own victim.example.com; they just need to be able to complete DV on whatever they put in the annotation. In shared-infrastructure setups (internal platform teams, SaaS landlord/tenant) DV is often trivially doable.

Yes, Cloudflare still requires DV before routing traffic, and yes the feature is off by default. But "off by default" isn't much of a control once someone turns it on, because nothing in the provider scopes what a tenant is allowed to ask for.

Things that would help:

  • A --cloudflare-custom-hostnames-allowlist (suffix list or regex). When the feature is on and no allowlist is set, reject everything (fail closed).
  • Or a per-namespace opt-in: a cluster-admin-set label gating whether annotations in that namespace are honored.
  • Either way, the flag help text deserves a sentence about the multi-tenant surface. Right now it reads like a purely additive feature.

I can send a PR for the allowlist shape if you're open to it.

2. ZoneHasPaidPlan is uncached, called per-change, reachable via annotation

Files: provider/cloudflare/cloudflare.go:194-213, provider/cloudflare/cloudflare.go:733-735

// newCloudFlareChange
if len(comment) > freeZoneMaxCommentLength {
    comment = p.DNSRecordsConfig.trimAndValidateComment(ep.DNSName, comment, p.ZoneHasPaidPlan)
}

ZoneHasPaidPlan does a ListZones (paginated) plus a GetZone every time it runs, no cache. It's called via trimAndValidateComment whenever a record comment is longer than 100 chars, and the record comment is set by external-dns.alpha.kubernetes.io/cloudflare-record-comment, which any Ingress author can put on their resource.

So a user with create rights in a watched namespace puts a 101-char comment on N ingresses, and external-dns fires 2N Cloudflare API calls per sync cycle just to re-decide the truncation policy for the same zone. Once the operator's account gets rate-limited, real DNS changes come back as SoftError and the cluster's DNS stops converging until Cloudflare cools off.

Caching the plan per zone for the duration of a sync would kill this (plan changes are on the order of hours or days, not seconds). A map of zone ID to bool with a short TTL is enough.

It's also worth bounding comment length at the source/annotation layer so the provider never sees anything over paidZoneMaxCommentLength to begin with.

The rest

These were the two HIGH-severity items from a broader review. Medium and low findings (token-file size bound, bearer-token redaction in SDK error strings, substring-based rate-limit classification, shouldBeProxied swallowing malformed values) I can file separately if you'd like.

Filing public because (1) needs both the feature flag and passing DV, and (2) is a reliability bug. If you'd rather see either moved to kubernetes.io/security, I'll pull the custom-hostname one and resubmit there.

cc @njuettner @hjacobs @Raffo per SECURITY_CONTACTS.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions