A live demo for the AI Engineer Summit. Two AI agents (recon + action) drive real cloud resources through the stackql MCP server, using SQL as the only interface to Cloudflare and Confluent Kafka. The talking point: infrastructure as data.
Target run time: ~3 minutes recorded; the agent loop itself completes in ~30-60 seconds.
- loadgen.py drives synthetic + AI-crawler-UA traffic at
stackql.xyz(a throwaway Cloudflare zone) so analytics have something real to report - recon agent SELECTs against Cloudflare GraphQL Analytics + the active rate-limit rule + the active Kafka cluster
- action agent UPDATEs the rate-limit threshold (Cloudflare) and INSERTs a decision record (Confluent Kafka)
- Records are visible live in the Confluent Cloud UI Messages tab
Forced by Confluent's auth surface: control-plane keys can't hit data-plane endpoints, and data-plane keys can't exist until the cluster does.
Stack 1 — infrastructure/control-plane/ (provider: confluent)
kafka_cluster— managed BASIC cluster, single AZservice_account— principal for data-plane writessa_cluster_admin— role binding (CloudClusterAdmin) — MUST run beforecluster_api_keyso the vended key inherits perms at mint timecluster_api_key— vended cluster-scoped key, secret captured viareturn_vals.create+RETURNING *(then unpacked in iql exports via{{ this.kafka_api_key_spec_json }})- Stack-level exports surface cluster ids + the vended creds to
.stackql-deploy-exports(auto-written by stackql-deploy)
Phase 1.5 (bootstrap.sh) — polls pkc-*.../kafka/v3/clusters/*/topics
with the vended key until 200 (handles RBAC propagation delay, usually
1-2 attempts).
Stack 2 — infrastructure/data-plane/ (providers: kafka, cloudflare)
decision_log_topic— Kafka topic viakafka.kafka.topicscanary_record— non-empty bootstrap record viakafka.kafka.recordsrate_limit_rule— Cloudflare rate-limit viacloudflare.rulesets.phases(modern rulesets, NOT legacy — see gotchas)
Bootstrap orchestration: infrastructure/bootstrap.sh
- Runs stack 1, sources
.stackql-deploy-exports, polls data plane, runs stack 2, all credentials passed via-eflags.
Teardown: infrastructure/teardown.sh — tears down stack 1 only;
deleting the Kafka cluster cascades topic + records + ACLs on Confluent's
side. Cloudflare rate-limit needs separate cleanup if you care.
Shell-sourced (set -a; . .env; set +a). NOT loaded via python-dotenv —
that dependency has been removed from requirements.txt. The .env uses
shell variable expansion to alias the same Confluent key into the Kafka
provider's expected env vars where applicable (though for this demo we
ended up with separate keys per Confluent's design).
Required:
CONFLUENT_CLOUD_API_KEY/_SECRET— Cloud key bound to "My account" with Global scope (OrganizationAdmin). For control plane only.CONFLUENT_ENVIRONMENT_IDCLOUDFLARE_API_TOKEN— needs Account WAF Write, Account Rulesets Write, Account Firewall Access Rules Write, Zone Firewall Services Edit, Zone WAF Edit, plus broad reads. Bot Fight Mode is toggled in UI.CLOUDFLARE_ZONE_IDANTHROPIC_API_KEY— for demo.py only
KAFKA_API_KEY / KAFKA_API_SECRET are MINTED by bootstrap.sh (not
manually set) and propagated to stack 2 via the exports file.
These are real constraints learned the hard way. Don't relitigate:
- Never monkey-patch provider specs in
.stackql/src/— those files are refreshed from the registry on pull. Fixes must go upstream in the provider repos. - Never delete files with secret-shaped content (
*.env*,*.secret,*.key,*credential*) without explicit instruction. I deletedinfrastructure/.envonce thinking it was redundant; it wasn't. The user had to regenerate every secret. Don't. - Don't add
python-dotenv— env is sourced via bash, not Python. Adding the dependency back will fragment env loading and the user will push back. - Never amend git commits — always new commits.
- Don't push without explicit user instruction.
- The Anthropic / Cloudflare / Confluent secrets in this conversation transcript should all be considered compromised — they were pasted in curl outputs and build logs. Treat them as rotation-pending. The user is aware and rotates manually after demo iterations.
auth:blocks at resource level are silently ignored. Documented but non-functional. Don't try to use per-resource credential switching. Issue needs to be filed againststackql-deploy-rs.protected:redacts inexport_varsbut RETURNING capture log line leaks the raw response. Look forRETURNING [spec] for [X] captured as [this.Y] = [<unredacted JSON>]. Issue needs to be filed./*+ createorupdate */is the right anchor for resources where the underlying API verb is REPLACE/PUT (no separate create-vs-update). Don't use/*+ update */— stackql-deploy errors with "iql file must include either 'create' or 'createorupdate' anchor."- Script resource
run:blocks DO get templated. They run undersh -c. To export values from a script, print a JSON object to stdout with the keys matchingexports:entries. - Process env cannot be mutated mid-stackql-deploy. A script resource cannot export to the parent's env. Persist values to a file and source externally (e.g. bootstrap.sh between stacks).
- Stack exports only emit variables named in the manifest's top-level
exports:list to.stackql-deploy-exports. They can reference any variable that was set by any resource — not just the last. return_vals.createcaptures RETURNING into resource-scoped context (this.X), accessible in the SAME resource's iql exports block via{{ this.X }}. Cross-resource scope ({{ other_resource.X }}) works in iql but NOT in inlinesql:on atype: queryresource.
- REPLACE follows UPDATE shape:
REPLACE <table> SET col = val WHERE ...NOTREPLACE INTO <table>(cols) SELECT .... Confirmed instackql-parser/go/vt/sqlparser/sql.y(update_or_replace: UPDATE | REPLACE). - JSON_EXTRACT is NOT supported in RETURNING projections. Capture the
raw column (e.g.
RETURNING id, spec) and JSON_EXTRACT in a downstream exports SELECT. - No subqueries in DELETE —
DELETE FROM x WHERE id IN (SELECT ...)errors. JSON('[...]')wrapper is REQUIRED for SET values whose schema type is array (and likelyobjecttoo). Without it, stackql naive translator sends"col": "[...]"(string-wrapped) instead of"col": [...](parsed array). Cloudflare rejects with400 invalid JSON: 'col' cannot be a string. Confirmed bytest/robot/functional/stackql_mocked_from_cmd_line.robotfixtures usingJSON('[ "SFTP" ]')for AWSdata__Protocols.vw_*views in some providers have a projection bug where multi-column SELECT combined with WHERE on the required-param column reportscould not locate symbol <col>. Workaround: query the underlying raw table withJSON_EXTRACT(spec, '$.<col>')instead. Hit onconfluent.managed_kafka_clusters.vw_clusters.
- Cloud API keys (god key bound to My account with Global scope) can NOT
hit the data plane. Cloudflare-style "one key" doesn't work — Confluent
rejects them at the Kafka REST endpoint with 401. The data plane only
accepts cluster-scoped vended API keys minted via
confluent.iam.api_keyswithspec.resource.id = <cluster_id>. - The vended key's permissions snapshot the principal's RBAC at MINT TIME.
Grant
sa_cluster_adminBEFORE creatingcluster_api_keyor the key is born unprivileged. - Cluster API key secret is only in the create response. Subsequent
SELECTs against
confluent.iam.api_keyswon't include it. UseRETURNING *+return_vals.create: [{spec: <name>}]to capture, unpack later. If you skip the create (idempotent re-run), the secret is gone — delete the api key in Confluent UI to re-mint. - ksqlDB minimum is now 4 CSUs, ~$0.89/hour. Originally we planned to use it for the analysis story; we dropped it. Use Confluent Cloud UI Messages tab for live record inspection instead.
replication_factordefaults to 3 on Confluent Cloud topic create; don't try to override on BASIC clusters.- API key INSERT body uses
spec(nodata__prefix) — provider hasrequestBodyTranslate: naiveso columns become top-level body keys.
- Legacy
/zones/{id}/rate_limitsAPI is in MAINTENANCE MODE. Reads work, writes return HTTP 403code 10037 ratelimit.api.maintenance_mode. Don't go down this rabbit hole — use modern rulesets (http_ratelimitphase) instead. Confirmed with live curl on 2026-05-30. - Modern rulesets PUT had a published provider bug where
id,version,last_updatedwere marked required in the request body schema despite beingreadOnly: true. stackql forced them into both WHERE and body, Cloudflare rejected. Fix is upstream instackql-provider-cloudflare(the user is patching it as of last conversation). After the fixed provider lands,REPLACE cloudflare.rulesets.phases SET rules = '...' WHERE zone_id = ... AND ruleset_phase = 'http_ratelimit'should work. - Free Cloudflare plan constraints on rate-limit rules:
periodMUST be 10 (not 60 — that's paid-plan only)characteristicsmust include BOTHip.srcANDcf.colo.idexpressionmust usestarts_with()notmatches(regex is paid-only)
- Cloudflare GraphQL Analytics requires Account-scoped Analytics Read permission on the token (zone-scoped is insufficient). The token has this.
- GraphQL provider operations land under top-10 namespaces:
cloudflare.zones.http_requests_adaptive_groups— main demo recon sourcecloudflare.firewall.firewall_events_adaptive_groups— bot/threat flags- Plus 8 others wired by the user (
httpRequests1hGroups,firewallEventsAdaptive,httpRequestsOverviewAdaptiveGroups,dnsAnalyticsAdaptiveGroups,workersInvocationsAdaptive,r2OperationsAdaptiveGroups,d1AnalyticsAdaptiveGroups,cdnNetworkAnalyticsAdaptiveGroups).
- Bot Fight Mode is enabled via UI (Security → Settings → Bot fight mode). The token doesn't have perm to toggle it programmatically.
- The beacon Worker (
fancy-boat-8ddc) servestest.stackql.xyz/beacon.giffor organic traffic generation from microsite footers. Separate workstream from the main demo.
edgepilot/
├── .env # secrets, shell-source only (gitignored)
├── .env.example # template
├── .stackql-deploy-exports # auto-written by stack 1, sourced by bootstrap.sh
├── stackql # local stackql binary (linux/x86_64)
├── stackql-deploy # local stackql-deploy binary
├── demo.py # the 2-agent loop
├── loadgen.py # traffic gen for cloudflare zone
├── requirements.txt # anthropic, mcp, aiohttp (NO python-dotenv)
├── claude_desktop_config.json # MCP config for the Claude Desktop variant
├── README.md
├── SCRIPT.md # speaking script (needs rewrite — Gap 6)
├── CLAUDE.md # this file
├── infrastructure/
│ ├── bootstrap.sh # provisions stack 1 then stack 2
│ ├── teardown.sh # tears down stack 1 (cluster cascade)
│ ├── control-plane/
│ │ ├── stackql_manifest.yml
│ │ └── resources/
│ │ ├── kafka_cluster.iql
│ │ ├── service_account.iql
│ │ ├── sa_cluster_admin.iql
│ │ └── cluster_api_key.iql
│ ├── data-plane/
│ │ ├── stackql_manifest.yml
│ │ └── resources/
│ │ ├── decision_log_topic.iql
│ │ ├── canary_record.iql
│ │ └── rate_limit_rule.iql
│ └── assurance/
│ ├── 01_kafka_cluster.iql
│ ├── 02_service_account.iql
│ ├── 03_rate_limit.iql
│ └── README.md
└── .stackql/src/ # provider specs — DO NOT EDIT IN PLACE
├── confluent/v00.00.00000/ # local dev split (control plane only)
├── kafka/v00.00.00000/ # local dev split (data plane only)
└── cloudflare/v26.05.00399/ # published, fix in progress upstream
- Cloudflare zone (
stackql.xyz) exists, on free plan, Bot Fight Mode on - Confluent environment exists (
env-...) .envpopulated (use.env.exampleas template), shell-sourcedpip install -r requirements.txtin a venv
set -a; . .env; set +a
bash infrastructure/bootstrap.shTakes ~7-10 min (mostly cluster provision). Idempotent for cluster, SA, role
binding, role binding. NOT idempotent for cluster_api_key (re-run = key
already exists = no RETURNING = bootstrap fatals because secret can't be
recaptured). If you re-bootstrap, delete the api key in Confluent UI first.
python demo.pyReads CLOUDFLARE_ZONE_ID + CONFLUENT_ENVIRONMENT_ID from env. The agents
discover everything else at runtime via SELECT (query-before-mutate is the
demo mantra).
bash infrastructure/teardown.shCluster delete cascades data plane. Cloudflare rate-limit rule persists — re-bootstrap will overwrite it idempotently, or curl-delete manually.
./stackql exec -i infrastructure/assurance/01_kafka_cluster.iql
./stackql exec -i infrastructure/assurance/02_service_account.iql
./stackql exec -i infrastructure/assurance/03_rate_limit.iql- Gap 2 —
demo.pyvalidation. Hold until Cloudflare provider fix lands (in flight). Then run end-to-end, smooth log noise, validate the agent loop completes in <60s. - Gap 3 — Analysis queries (live SQL on stage to show the topic contents). Confluent UI is the primary surface; supplementary stackql queries optional.
- Gap 4 — Claude Desktop variant. Tweak
claude_desktop_config.jsonfor Windowsstackql.exe, document the swap in README. - Gap 5 — Teardown polish. Currently OK but cluster_api_key delete skips with "unresolved variables" message (id only available within same-run scope). Harmless because cluster delete cascades the key, but cosmetic improvement possible.
- Gap 6 — Rewrite
SCRIPT.mdto match the actual demo flow after the whole chain works end-to-end. Current script references constructs that no longer exist.
stackql-deploy-rs:auth:block on resources silently ignored.stackql-deploy-rs:protected:exports leak through RETURNING capture log line.stackql-deploy-rs(cosmetic):cluster_api_keyteardown logs "unresolved variables, assuming resource does not exist, skipping" when called outside the same-run scope. Cluster cascade handles it functionally.stackql-provider-confluent:vw_clustersprojection bug (multi-column SELECT + WHERE on required-param column fails).stackql-provider-confluent:api_endpointcolumn returns empty string for new Basic-tier clusters; downstream code should usehttp_endpointinstead. Either fix the upstream or update docs to deprecate the column.stackql-provider-cloudflare: rulesets PUT body schema marks readOnly fields as required (IN PROGRESS — user is patching).