Skip to content

Conversation

anlowee
Copy link
Member

@anlowee anlowee commented Jul 7, 2025

Description

As the title.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Check the markdown file is rendered correctly on Github page.

Summary by CodeRabbit

  • New Features

    • Introduced comprehensive documentation for setting up and running Presto with clp-package on a local Kubernetes cluster using k3d and Helm.
    • Added Helm chart and configuration files for deploying Presto coordinator and worker nodes.
    • Provided ready-to-use configuration templates and scripts for service setup and IP address customization.
    • Included sample asset files and a script to streamline initial configuration for local deployments.
    • Added Docker Compose setup with environment configurations and scripts to run Presto coordinator and worker services locally.
    • Enabled dynamic configuration generation with automated scripts for both Kubernetes and Docker deployments.
  • Chores

    • Added ignore files to exclude unnecessary assets and system files from version control and Helm packaging.

Copy link

coderabbitai bot commented Jul 7, 2025

Walkthrough

This change introduces a new Kubernetes Helm chart for deploying Presto and clp-package services locally using k3d. It adds configuration files, deployment manifests, scripts, and documentation to enable users to set up a local cluster, configure Presto coordinator and worker nodes, and integrate clp-package datasets and services. Additionally, it provides a parallel Docker Compose setup with scripts and configuration templates for running Presto and clp-package in Docker containers.

Changes

File(s) Change Summary
yscope-k8s/README.md Added detailed setup and usage instructions for running Presto and clp-package on local Kubernetes with k3d and Helm.
yscope-k8s/.helmignore, yscope-k8s/.gitignore Added ignore files for Helm packaging and Git version control, specifying files/directories to exclude.
yscope-k8s/Chart.yaml, yscope-k8s/values.yaml Added Helm chart metadata and default deployment configuration values for Presto coordinator and worker.
yscope-k8s/templates/presto/presto-coordinator-config.yaml Added ConfigMap resource containing configuration files for the Presto coordinator.
yscope-k8s/templates/presto/presto-coordinator.yaml Added Kubernetes manifest for Presto coordinator pod, service, and init script ConfigMap.
yscope-k8s/templates/presto/presto-worker-config.yaml Added ConfigMap resource containing configuration files for the Presto worker.
yscope-k8s/templates/presto/presto-worker.yaml Added Deployment manifest and init script ConfigMap for Presto worker node.
yscope-k8s/demo-assets/clp-config.yml.bak, yscope-k8s/demo-assets/init.sh Added backup configuration file and script to set up clp-package config with dynamic IP replacement.
yscope-compose/README.md Added instructions for setting up a local Docker stack for Presto and clp-package using Docker Compose.
yscope-compose/.env Added environment variable definitions for Presto coordinator and worker configuration.
yscope-compose/docker-compose.yaml Added Docker Compose file defining Presto coordinator and worker services with volumes, networks, and environment variables.
yscope-compose/coordinator/config-template/, yscope-compose/worker/config-template/ Added configuration templates for Presto coordinator and worker components, including properties and JVM settings.
yscope-compose/coordinator/scripts/generate-configs.sh, yscope-compose/worker/scripts/generate-configs.sh Added shell scripts to generate Presto configuration files with environment variable substitution and dynamic version retrieval.
yscope-compose/coordinator/scripts/$ Added shell script to substitute environment variables in config templates and organize catalog files for Presto coordinator.
yscope-compose/demo-assets/clp-config.yml, yscope-compose/demo-assets/init.sh Added clp-package configuration file with service parameters and a script to generate sample config and update metadata environment variables.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant LocalMachine
    participant k3d
    participant Kubernetes
    participant Helm
    participant PrestoCoordinator
    participant PrestoWorker
    participant clp-package
    participant DockerCompose

    User->>LocalMachine: Install prerequisites (Docker, kubectl, k3d, Helm)
    User->>LocalMachine: Prepare clp-package dataset/config
    User->>LocalMachine: Run clp-package services (init.sh, etc.)
    User->>k3d: Create local Kubernetes cluster
    k3d->>Kubernetes: Provision cluster
    User->>Helm: Deploy Presto chart
    Helm->>Kubernetes: Apply manifests (coordinator, worker, configs)
    Kubernetes->>PrestoCoordinator: Start pod/init container
    PrestoCoordinator->>Kubernetes: Register service
    Kubernetes->>PrestoWorker: Start deployment/init container
    PrestoWorker->>PrestoCoordinator: Fetch version, configure node
    User->>Kubernetes: Port-forward Presto coordinator
    User->>PrestoCoordinator: Access WebUI / run CLI queries
    PrestoCoordinator->>clp-package: Query dataset via connector

    User->>LocalMachine: Alternatively, prepare Docker Compose environment
    User->>DockerCompose: Launch Presto coordinator and worker containers
    DockerCompose->>PrestoCoordinator: Start container with config generation
    DockerCompose->>PrestoWorker: Start container with config generation and version fetch
    User->>DockerCompose: Access Presto CLI inside coordinator container
Loading

Estimated code review effort

3 (Moderate) | ⏱️ ~40 minutes

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ea32d6 and 5f0eb8c.

📒 Files selected for processing (1)
  • yscope-k8s/README.md (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
yscope-k8s/README.md (1)
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
🪛 markdownlint-cli2 (0.17.2)
yscope-k8s/README.md

25-25: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


53-53: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


61-61: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


64-64: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


101-101: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


101-101: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


114-114: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


114-114: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


131-131: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


131-131: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


144-144: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


144-144: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


149-149: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


149-149: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


165-165: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


165-165: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


170-170: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


174-174: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


194-194: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


194-194: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


199-199: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


199-199: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


209-209: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


209-209: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


217-217: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


217-217: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


223-223: Files should end with a single newline character

(MD047, single-trailing-newline)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)

Comment on lines 54 to 58
kubectl port-forward svc/presto-coordinator 8080:8080
# Check is coordinator alive
curl -X GET http://coordinator:8080/v1/info
# Check is worker connected to the coordinator
curl -X GET http://coordinator:8080/v1/nodes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Service name mismatch breaks the port-forward sanity-check

The manifest defines the Service as coordinator, but the runbook forwards svc/presto-coordinator and the subsequent curl targets coordinator:8080. Neither will resolve on the host.
Align the names and use localhost after port-forwarding:

-kubectl port-forward svc/presto-coordinator 8080:8080
-# Check is coordinator alive
-curl -X GET http://coordinator:8080/v1/info
-# Check is worker connected to the coordinator
-curl -X GET http://coordinator:8080/v1/nodes
+kubectl port-forward svc/coordinator 8080:8080
+# Verify coordinator is alive
+curl -X GET http://localhost:8080/v1/info
+# Verify workers are registered
+curl -X GET http://localhost:8080/v1/nodes

Also consider renaming the Service in the YAML to presto-coordinator if that better matches your conventions; just keep the doc and manifest consistent.

Also applies to: 84-88


To confirm the Service name defined in your Kubernetes manifest, let’s search for all Service resources under yscope-k8s:


🏁 Script executed:

#!/bin/bash
# List all Service definitions and their names in the yscope-k8s directory
rg -n "kind: Service" -A5 yscope-k8s

Length of output: 263


Let’s verify the actual Kubernetes Service resource names in your manifests (not just the README) under yscope-k8s:


🏁 Script executed:

#!/bin/bash
# Search for Service definitions in all YAML files under yscope-k8s
rg -n "kind: Service" -A5 yscope-k8s --glob '*.yaml'

Length of output: 54


To locate where your Kubernetes Service manifests are defined, let’s list all YAML files under yscope-k8s:


🏁 Script executed:

#!/bin/bash
# Find all Kubernetes manifest files under yscope-k8s
find yscope-k8s -type f \( -iname '*.yaml' -o -iname '*.yml' \)

Length of output: 67


Align README port-forward commands with the Service name

The Service in your README snippet is defined as name: coordinator (lines 85–88), but the port-forward and curl commands target presto-coordinator and coordinator directly. This will fail at runtime. Update the commands to use the actual Service name and localhost:

• File: yscope-k8s/README.md
• Sections: lines 54–58 and 84–88

- kubectl port-forward svc/presto-coordinator 8080:8080
- # Check is coordinator alive
- curl -X GET http://coordinator:8080/v1/info
- # Check is worker connected to the coordinator
- curl -X GET http://coordinator:8080/v1/nodes
+ kubectl port-forward svc/coordinator 8080:8080
+ # Verify coordinator is alive
+ curl -X GET http://localhost:8080/v1/info
+ # Verify workers are registered
+ curl -X GET http://localhost:8080/v1/nodes

Alternatively, if you prefer the Service to be named presto-coordinator, update its metadata.name in the manifest snippet—and keep the docs in sync.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
kubectl port-forward svc/presto-coordinator 8080:8080
# Check is coordinator alive
curl -X GET http://coordinator:8080/v1/info
# Check is worker connected to the coordinator
curl -X GET http://coordinator:8080/v1/nodes
kubectl port-forward svc/coordinator 8080:8080
# Verify coordinator is alive
curl -X GET http://localhost:8080/v1/info
# Verify workers are registered
curl -X GET http://localhost:8080/v1/nodes
🤖 Prompt for AI Agents
In yscope-k8s/README.md around lines 54 to 58 and 84 to 88, the port-forward and
curl commands use the Service name `presto-coordinator` and `coordinator`
inconsistently with the actual Service metadata.name, causing runtime failures.
Fix this by either updating the port-forward and curl commands to use the
correct Service name `coordinator` and target localhost (e.g., `kubectl
port-forward svc/coordinator 8080:8080` and curl to `http://localhost:8080`), or
alternatively rename the Service metadata.name to `presto-coordinator` in the
manifest and keep the commands consistent with that name.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

🔭 Outside diff range comments (1)
yscope-k8s/values.yaml (1)

22-23: Superfluous trailing blank line

YAMLlint flagged the extra blank line; removing it keeps the chart lint-clean.

♻️ Duplicate comments (2)
yscope-k8s/README.md (2)

107-113: Update service name and target host in port-forward example

The manifest exposes the coordinator as service/presto-coordinator. After port-forwarding, curl should hit localhost, not the internal DNS name.

-kubectl port-forward service/presto-coordinator 8080:8080
+kubectl port-forward svc/presto-coordinator 8080:8080

Then browse http://localhost:8080.


1-26: Keep a single H1 and surround headings with blank lines

Multiple # headings violate MD025 and hurt document structure. Downgrade secondary titles to ## and insert blank lines before/after each heading.

-# Setup local K8s cluster for presto + clp
+# Setup local K8s cluster for Presto + CLP-## Install docker
+## Install Docker

Repeat for all subsequent top-level headings (# Launch clp-package, # Create k8s Cluster, # Working with helm chart, etc.).

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f0eb8c and 5bae0ce.

⛔ Files ignored due to path filters (2)
  • yscope-k8s/mongod-256MB-presto-clp.log.tar.gz is excluded by !**/*.gz
  • yscope-k8s/presto-cli-0.293-executable.jar is excluded by !**/*.jar
📒 Files selected for processing (11)
  • yscope-k8s/.helmignore (1 hunks)
  • yscope-k8s/Chart.yaml (1 hunks)
  • yscope-k8s/README.md (1 hunks)
  • yscope-k8s/templates/object-store/aws-credentials.yaml (1 hunks)
  • yscope-k8s/templates/object-store/bucket-creation.yaml (1 hunks)
  • yscope-k8s/templates/object-store/minio.yaml (1 hunks)
  • yscope-k8s/templates/presto/presto-coordinator-config.yaml (1 hunks)
  • yscope-k8s/templates/presto/presto-coordinator.yaml (1 hunks)
  • yscope-k8s/templates/presto/presto-worker-config.yaml (1 hunks)
  • yscope-k8s/templates/presto/presto-worker.yaml (1 hunks)
  • yscope-k8s/values.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
yscope-k8s/values.yaml (1)
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
yscope-k8s/templates/presto/presto-worker-config.yaml (1)
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
yscope-k8s/templates/presto/presto-coordinator-config.yaml (1)
Learnt from: wraymo
PR: y-scope/presto#15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
yscope-k8s/templates/presto/presto-worker.yaml (1)
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
yscope-k8s/templates/presto/presto-coordinator.yaml (1)
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
yscope-k8s/README.md (1)
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
🪛 Checkov (3.2.334)
yscope-k8s/templates/object-store/aws-credentials.yaml

[LOW] 1-8: The default namespace should not be used

(CKV_K8S_21)

yscope-k8s/templates/presto/presto-coordinator-config.yaml

[LOW] 12-13: Base64 High Entropy String

(CKV_SECRET_6)

🪛 YAMLlint (1.37.1)
yscope-k8s/values.yaml

[error] 22-22: too many blank lines (1 > 0)

(empty-lines)

yscope-k8s/templates/object-store/minio.yaml

[error] 5-5: too many spaces inside braces

(braces)


[error] 5-5: too many spaces inside braces

(braces)


[error] 6-6: too many spaces inside braces

(braces)


[error] 6-6: too many spaces inside braces

(braces)


[error] 9-9: too many spaces inside braces

(braces)


[error] 9-9: too many spaces inside braces

(braces)


[error] 21-21: too many spaces inside empty braces

(braces)


[error] 26-26: too many spaces inside braces

(braces)


[error] 26-26: too many spaces inside braces

(braces)


[error] 28-28: too many spaces inside braces

(braces)


[error] 28-28: too many spaces inside braces

(braces)


[error] 33-33: too many spaces inside braces

(braces)


[error] 33-33: too many spaces inside braces

(braces)


[error] 34-34: too many spaces inside braces

(braces)


[error] 34-34: too many spaces inside braces

(braces)


[error] 36-36: too many spaces inside braces

(braces)


[error] 36-36: too many spaces inside braces

(braces)


[error] 37-37: too many spaces inside braces

(braces)


[error] 37-37: too many spaces inside braces

(braces)


[error] 39-39: too many spaces inside braces

(braces)


[error] 39-39: too many spaces inside braces

(braces)


[error] 39-39: no new line character at the end of file

(new-line-at-end-of-file)

yscope-k8s/templates/presto/presto-worker-config.yaml

[error] 24-24: no new line character at the end of file

(new-line-at-end-of-file)

yscope-k8s/templates/object-store/bucket-creation.yaml

[error] 27-27: too many spaces after colon

(colons)

yscope-k8s/templates/presto/presto-worker.yaml

[error] 4-4: too many spaces inside braces

(braces)


[error] 4-4: too many spaces inside braces

(braces)


[error] 6-6: too many spaces inside braces

(braces)


[error] 6-6: too many spaces inside braces

(braces)


[error] 8-8: too many spaces inside braces

(braces)


[error] 8-8: too many spaces inside braces

(braces)


[error] 11-11: too many spaces inside braces

(braces)


[error] 11-11: too many spaces inside braces

(braces)


[error] 15-15: too many spaces inside braces

(braces)


[error] 15-15: too many spaces inside braces

(braces)


[error] 34-34: too many spaces inside braces

(braces)


[error] 34-34: too many spaces inside braces

(braces)


[error] 35-35: too many spaces inside braces

(braces)


[error] 35-35: too many spaces inside braces

(braces)


[error] 53-53: trailing spaces

(trailing-spaces)

yscope-k8s/templates/presto/presto-coordinator.yaml

[warning] 9-9: comment not indented like content

(comments-indentation)


[warning] 10-10: missing starting space in comment

(comments)


[warning] 11-11: missing starting space in comment

(comments)


[error] 55-55: too many spaces inside braces

(braces)


[error] 55-55: too many spaces inside braces

(braces)


[error] 56-56: too many spaces inside braces

(braces)


[error] 56-56: too many spaces inside braces

(braces)


[error] 79-79: too many blank lines (1 > 0)

(empty-lines)

🪛 LanguageTool
yscope-k8s/README.md

[uncategorized] ~112-~112: Possible missing comma found.
Context: ...al laptop, to access the Presto's WebUI by e.g., http://localhost:8080 To use pre...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.17.2)
yscope-k8s/README.md

26-26: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


26-26: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


27-27: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


31-31: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


32-32: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


73-73: Ordered list item prefix
Expected: 1; Actual: 4; Style: 1/1/1

(MD029, ol-prefix)


73-73: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


74-74: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


79-79: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1

(MD029, ol-prefix)


79-79: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


80-80: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


85-85: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1

(MD029, ol-prefix)


87-87: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


87-87: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


89-89: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


94-94: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


94-94: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


95-95: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above

(MD022, blanks-around-headings)


95-95: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


106-106: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


106-106: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)


108-108: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


112-112: Bare URL used

(MD034, no-bare-urls)


115-115: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


120-120: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


120-120: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


124-124: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


125-125: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


129-129: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


129-129: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


130-130: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


134-134: Multiple consecutive blank lines
Expected: 1; Actual: 2

(MD012, no-multiple-blanks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
🔇 Additional comments (5)
yscope-k8s/templates/object-store/minio.yaml (1)

1-22: Use Deployment instead of bare Pod for resiliency

A single Pod provides no restart or rolling-upgrade semantics. Switching to a Deployment (or StatefulSet if persistence is added) enables self-healing and replica management.

[ suggest_essential_refactor ]

yscope-k8s/.helmignore (1)

24-24: Looks good – sensible ignore list

No issues spotted; the list follows Helm best practices.

yscope-k8s/Chart.yaml (1)

1-25: Chart metadata OK

Fields conform to Helm v2 schema; nothing to change.

yscope-k8s/templates/presto/presto-coordinator-config.yaml (1)

24-25: discovery.uri duplicates scheme & port logic already templated

Because the Service is always in‐cluster, consider templating only the host and reusing the port value to avoid accidental drift:

discovery.uri=http://{{ .Values.presto.coordinator.serviceName }}:{{ .Values.presto.coordinator.httpPort }}
yscope-k8s/templates/presto/presto-coordinator.yaml (1)

12-15: Init-container command array needs a shell

Busybox expects ["sh","-c", …] when executing a script path; otherwise it tries to exec the file directly and fails if it lacks executable bit in the image layer. Either:

-      command: ["/scripts/generate-configs.sh"]
+      command: ["sh","-c","/scripts/generate-configs.sh"]

or ensure the script is chmod +x within the ConfigMap (defaultMode 0755 already helps).

Comment on lines 7 to 8
data:
credentials: "W2RlZmF1bHRdCmF3c19hY2Nlc3Nfa2V5X2lkID0gbWluaW9hZG1pbgphd3Nfc2VjcmV0X2FjY2Vzc19rZXkgPSBtaW5pb2FkbWluCg=="
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Static credentials committed to VCS – potential secret leak

The secret contains real base64-encoded keys (minioadmin). Once merged, the data is permanently recorded in git history. Replace with templated placeholders and document generating the secret locally.

-  credentials: "W2RlZmF1bHRdCmF3c19hY2Nlc3Nfa2V5X2lkID0gbWluaW9hZG1pbgphd3Nfc2VjcmV0X2FjY2Vzc19rZXkgPSBtaW5pb2FkbWluCg=="
+  # Base64 of a generated ~/.aws/credentials file
+  credentials: {{ .Values.objectStore.minio.awsCredentials | b64enc | quote }}

Add the key in values.yaml and mark it with # pragma: allowlist secret to silence secret-scanner noise.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
data:
credentials: "W2RlZmF1bHRdCmF3c19hY2Nlc3Nfa2V5X2lkID0gbWluaW9hZG1pbgphd3Nfc2VjcmV0X2FjY2Vzc19rZXkgPSBtaW5pb2FkbWluCg=="
data:
# Base64 of a generated ~/.aws/credentials file
credentials: {{ .Values.objectStore.minio.awsCredentials | b64enc | quote }}
🧰 Tools
🪛 Checkov (3.2.334)

[LOW] 1-8: The default namespace should not be used

(CKV_K8S_21)

🤖 Prompt for AI Agents
In yscope-k8s/templates/object-store/aws-credentials.yaml at lines 7-8, the
base64-encoded static credentials containing real keys are committed, risking
secret leaks. Replace the hardcoded base64 string with templated placeholders
that reference values from values.yaml. Then, add the actual secret keys in
values.yaml with the comment # pragma: allowlist secret to document and allow
secret scanning exceptions. Also, update documentation to instruct generating
and injecting secrets locally instead of committing them.

Comment on lines 18 to 21
image: "ghcr.io/anlowee/presto/prestissimo-with-clp-connector-runtime:ubuntu-22.04"
serviceName: "presto-worker"
httpPort: 8080
replicaCount: 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Minor naming inconsistency

You use replicaCount (camelCase) here but plain replicas in several upstream Presto charts. Aligning with common naming makes swapping values files easier.

🤖 Prompt for AI Agents
In yscope-k8s/values.yaml around lines 18 to 21, the field name replicaCount
uses camelCase, but upstream Presto charts use replicas in lowercase. Rename
replicaCount to replicas to align with common naming conventions and improve
compatibility with upstream values files.

Comment on lines 30 to 38
type: ClusterIP
ports:
- name: "api"
port: {{ .Values.objectStore.minio.apiPort }}
targetPort: {{ .Values.objectStore.minio.apiPort }}
- name: "console"
port: {{ .Values.objectStore.minio.consolePort }}
targetPort: {{ .Values.objectStore.minio.consolePort }}
selector:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Health probes missing

Adding livenessProbe and readinessProbe against the health endpoint (/minio/health/ready) improves observability and prevents routing to an unready server.

🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 33-33: too many spaces inside braces

(braces)


[error] 33-33: too many spaces inside braces

(braces)


[error] 34-34: too many spaces inside braces

(braces)


[error] 34-34: too many spaces inside braces

(braces)


[error] 36-36: too many spaces inside braces

(braces)


[error] 36-36: too many spaces inside braces

(braces)


[error] 37-37: too many spaces inside braces

(braces)


[error] 37-37: too many spaces inside braces

(braces)

🤖 Prompt for AI Agents
In yscope-k8s/templates/object-store/minio.yaml around lines 30 to 38, the
service definition is missing livenessProbe and readinessProbe configurations.
Add both probes to the pod spec, targeting the /minio/health/ready HTTP endpoint
on the appropriate port, to ensure Kubernetes can monitor the health and
readiness of the Minio server and avoid routing traffic to unhealthy or unready
instances.

Comment on lines +41 to +54
volumes:
- name: "presto-worker-config-script-volume"
configMap:
name: "presto-worker-config-script"
defaultMode: 0755
- name: "presto-worker-config-dir-volume"
emptyDir: {}
- name: "presto-worker-config-volume"
configMap:
name: "presto-worker-config"
- name: "clp-archives-volume"
hostPath:
path: "/var/data/archives"
---
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Tighten hostPath usage & remove trailing space

  1. Line 53 has a stray trailing space that breaks yamllint.
  2. Using hostPath without a type can be dangerous on cloud providers. Prefer:
-        - name: "clp-archives-volume"
-          hostPath:
-            path: "/var/data/archives" 
+        - name: "clp-archives-volume"
+          hostPath:
+            path: "/var/data/archives"
+            type: DirectoryOrCreate

This prevents the pod from failing on nodes where the directory is missing.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
volumes:
- name: "presto-worker-config-script-volume"
configMap:
name: "presto-worker-config-script"
defaultMode: 0755
- name: "presto-worker-config-dir-volume"
emptyDir: {}
- name: "presto-worker-config-volume"
configMap:
name: "presto-worker-config"
- name: "clp-archives-volume"
hostPath:
path: "/var/data/archives"
---
volumes:
- name: "presto-worker-config-script-volume"
configMap:
name: "presto-worker-config-script"
defaultMode: 0755
- name: "presto-worker-config-dir-volume"
emptyDir: {}
- name: "presto-worker-config-volume"
configMap:
name: "presto-worker-config"
- name: "clp-archives-volume"
hostPath:
path: "/var/data/archives"
type: DirectoryOrCreate
---
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 53-53: trailing spaces

(trailing-spaces)

🤖 Prompt for AI Agents
In yscope-k8s/templates/presto/presto-worker.yaml around lines 41 to 54, remove
the trailing space after the path "/var/data/archives" on line 53 to fix
yamllint errors. Additionally, add a 'type' field under hostPath, such as
'DirectoryOrCreate', to ensure the directory exists or is created, preventing
pod failures on nodes missing the directory.

Comment on lines +4 to +15
name: {{ .Values.presto.worker.serviceName }}
labels:
app: {{ .Values.presto.worker.serviceName }}
spec:
replicas: {{ .Values.presto.worker.replicaCount }}
selector:
matchLabels:
app: {{ .Values.presto.worker.serviceName }}
template:
metadata:
labels:
app: {{ .Values.presto.worker.serviceName }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Trim Helm-template whitespace to satisfy YAML linters

yamllint is flagging “too many spaces inside braces” for the placeholders. Helm does not care, but the warnings pollute CI and may mask real issues.

-  name: {{ .Values.presto.worker.serviceName }}
+  name: {{.Values.presto.worker.serviceName}}
 ...
-    app: {{ .Values.presto.worker.serviceName }}
+    app: {{.Values.presto.worker.serviceName}}

Apply the same pattern everywhere in the file (lines 4, 6, 8, 11, 15, 34-35).
If you prefer the spaced style, silence the rule with # yamllint disable-line rule:braces.

🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 4-4: too many spaces inside braces

(braces)


[error] 4-4: too many spaces inside braces

(braces)


[error] 6-6: too many spaces inside braces

(braces)


[error] 6-6: too many spaces inside braces

(braces)


[error] 8-8: too many spaces inside braces

(braces)


[error] 8-8: too many spaces inside braces

(braces)


[error] 11-11: too many spaces inside braces

(braces)


[error] 11-11: too many spaces inside braces

(braces)


[error] 15-15: too many spaces inside braces

(braces)


[error] 15-15: too many spaces inside braces

(braces)

🤖 Prompt for AI Agents
In yscope-k8s/templates/presto/presto-worker.yaml around lines 4 to 15, the Helm
template placeholders have extra spaces inside the braces causing yamllint
warnings for "too many spaces inside braces." Remove the extra spaces inside all
double curly braces for the placeholders on these lines and also on lines 34-35
to comply with YAML linting rules. Alternatively, if you want to keep the
spaces, add a yamllint disable comment on those lines to suppress the warnings.

Comment on lines +17 to +27
initContainers:
- name: "generate-configs"
image: "busybox"
command: ["/scripts/generate-configs.sh"]
volumeMounts:
- name: "presto-worker-config-script-volume"
mountPath: "/scripts"
- name: "presto-worker-config-dir-volume"
mountPath: "/opt/presto-server/etc"
- name: "presto-worker-config-volume"
mountPath: "/configs"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Pin the init-container image and verify script path

  1. image: "busybox" pulls the mutable latest tag, which can break determinism.
  2. BusyBox’ wget does not recognise --timeout; it uses -T. The script will silently fail to hit the coordinator.
-          image: "busybox"
+          image: "busybox:1.36.1"   # or whichever version you’ve validated
 ...
-          command: ["/scripts/generate-configs.sh"]
+          command: ["/bin/sh", "/scripts/generate-configs.sh"]

Then change the script (lines 86-87) to use -T 10 or switch to curl which is present in busybox:uclibc.
Failing to address this will keep the pod in CrashLoopBackOff until the back-off maxes out.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
initContainers:
- name: "generate-configs"
image: "busybox"
command: ["/scripts/generate-configs.sh"]
volumeMounts:
- name: "presto-worker-config-script-volume"
mountPath: "/scripts"
- name: "presto-worker-config-dir-volume"
mountPath: "/opt/presto-server/etc"
- name: "presto-worker-config-volume"
mountPath: "/configs"
initContainers:
- name: "generate-configs"
image: "busybox:1.36.1" # pinned to a specific, validated version
command: ["/bin/sh", "/scripts/generate-configs.sh"]
volumeMounts:
- name: "presto-worker-config-script-volume"
mountPath: "/scripts"
- name: "presto-worker-config-dir-volume"
mountPath: "/opt/presto-server/etc"
- name: "presto-worker-config-volume"
mountPath: "/configs"
🤖 Prompt for AI Agents
In yscope-k8s/templates/presto/presto-worker.yaml around lines 17 to 27, the
init-container image "busybox" is not pinned to a specific version, which risks
pulling a mutable latest tag and breaking determinism. Update the image to a
specific version tag to ensure stability. Additionally, the script at lines
86-87 uses wget with the unsupported --timeout option; modify the script to use
wget with the correct -T 10 option or replace wget with curl from busybox:uclibc
to prevent silent failures and pod CrashLoopBackOff errors.

Comment on lines +60 to +71
generate-configs.sh: |
#!/bin/sh

PRESTO_CONFIG_DIR="/opt/presto-server/etc"

# Setup the config directory hierarchy
mkdir -p ${PRESTO_CONFIG_DIR}/catalog

# Copy over files
cp /configs/* ${PRESTO_CONFIG_DIR}
mv ${PRESTO_CONFIG_DIR}/clp.properties ${PRESTO_CONFIG_DIR}/catalog

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden the copy/move logic in the init script

If clp.properties is absent the mv will abort the script despite set -e not being enabled.
Add strict error handling up-front and use mv -f to avoid race conditions:

-#!/bin/sh
+#!/bin/sh
+set -euo pipefail
 ...
-mv ${PRESTO_CONFIG_DIR}/clp.properties ${PRESTO_CONFIG_DIR}/catalog
+mv -f "${PRESTO_CONFIG_DIR}/clp.properties" "${PRESTO_CONFIG_DIR}/catalog/"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
generate-configs.sh: |
#!/bin/sh
PRESTO_CONFIG_DIR="/opt/presto-server/etc"
# Setup the config directory hierarchy
mkdir -p ${PRESTO_CONFIG_DIR}/catalog
# Copy over files
cp /configs/* ${PRESTO_CONFIG_DIR}
mv ${PRESTO_CONFIG_DIR}/clp.properties ${PRESTO_CONFIG_DIR}/catalog
generate-configs.sh: |
#!/bin/sh
set -euo pipefail
PRESTO_CONFIG_DIR="/opt/presto-server/etc"
# Setup the config directory hierarchy
mkdir -p ${PRESTO_CONFIG_DIR}/catalog
# Copy over files
cp /configs/* ${PRESTO_CONFIG_DIR}
mv -f "${PRESTO_CONFIG_DIR}/clp.properties" "${PRESTO_CONFIG_DIR}/catalog/"
🤖 Prompt for AI Agents
In yscope-k8s/templates/presto/presto-worker.yaml around lines 60 to 71, the
init script's copy and move commands lack strict error handling and do not use
the force option for moving clp.properties, which can cause the script to abort
if the file is missing. Add 'set -e' at the start of the script to enable strict
error handling and change the 'mv' command to 'mv -f' to force the move and
avoid race conditions.

sbin/compress.sh --timestamp-key 't.dollar_sign_date' datasets/mongod-256MB-processed.log
```

6. Use a JetBrain IDE to connect the database source. The database is `clp-db`, the user is `clp-user` and the password is in `etc/credential.yml`. Then modify the `archive_storage_directory` field in `clp_datasets` table to `/var/data/archives/default`, and submit the change.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's automate this with a script.

@@ -0,0 +1,158 @@
apiVersion: "batch/v1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this at the moment since we are launching log-viewer via the package.

@@ -0,0 +1,39 @@
apiVersion: "v1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this at the moment since we are launching log-viewer via the package.

@@ -0,0 +1,22 @@
# Configure object storage
objectStore:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I also remove the templates/minio.yaml?

.idea/
*.tmproj
.vscode/
clp/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is unnecessary here

.idea/
*.tmproj
.vscode/
clp/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any large file such as mongod-256MB-presto-clp.log.tar.gz and presto-cli-0.293-executable.jar need to be put into .helmignore. This is also the case for things that are extracted and put into the helm folder, we need to add it to helmignore.

An alternative is to explicitly ask user to decompress logs to /tmp using tar -xf xxx -C /tmp and explain why and what to do afterwards.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean just add them to the .helmignore or delete them from the repo?


2. Untar it.

3. Replace the content of `etc/clp-config.yml` with the following (also replace the IP address `${REPLACE_IP}` with the actual IP address of the host that you are running the clp-package):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this file somewhere, then use a oneliner command to get and replace the ip address

- name: "presto-coordinator-config-volume"
configMap:
name: "presto-coordinator-config"
# TODO: add CLP data directory volume host path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the TODO

name: "{{ .Values.presto.coordinator.serviceName }}"
spec:
initContainers:
# - name: "start-clp-package"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean up the config

@anlowee anlowee force-pushed the xwei/add-k8s-manual-doc branch from 97cf141 to 80ec62d Compare July 17, 2025 19:39
@anlowee anlowee requested a review from jackluo923 July 17, 2025 19:41
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

♻️ Duplicate comments (10)
yscope-k8s/.helmignore (1)

26-28: Add explicit patterns for large binary artefacts (e.g., JARs & TAR archives).

Large files such as presto-cli-*.jar and *.log.tar.gz that are already in the repo will still be packaged into the chart because they’re not covered by the current patterns. This bloats the resulting .tgz and slows down every Helm operation.

 # Demo assets
 demo-assets/
+# Large binaries – keep the chart lean
+presto-cli-*.jar
+*.log.tar.gz
+*.tar.gz
yscope-k8s/values.yaml (1)

11-11: replicaCount vs replicas naming mismatch (upstream compatibility)
Same concern was raised earlier; consider aligning with upstream Presto charts.

yscope-k8s/templates/object-store/minio.yaml (3)

10-14: Shell path /bin/bash unavailable in MinIO image
This was flagged previously; switch to /bin/sh.


8-18: Health probes are still absent – add liveness & readiness checks
Same comment as before, see earlier suggestion.


8-18: Default root credentials remain exposed – populate via Secret
See prior review; recommended secret-based override still applies.

yscope-k8s/templates/presto/presto-coordinator.yaml (2)

1-7: Prefer a Deployment over a naked Pod
Self-healing and rolling updates are lost with a bare Pod. Converting this to a Deployment (or StatefulSet if stable hostnames are required) aligns with k8s best practices.


50-52: Strip spaces inside Helm braces to satisfy yamllint

-      port: {{ .Values.presto.coordinator.httpPort }}
-      targetPort: {{ .Values.presto.coordinator.httpPort }}
+      port: {{.Values.presto.coordinator.httpPort}}
+      targetPort: {{.Values.presto.coordinator.httpPort}}

This removes the braces error and prevents Helm strict-mode failures.

yscope-k8s/templates/object-store/bucket-creation.yaml (3)

11-12: Pin the AWS CLI image for reproducibility
amazon/aws-cli:latest may introduce breaking changes without notice. Pin to a specific version (e.g., amazon/aws-cli:2.15.15) or digest.


4-30: Add backoffLimit to shorten failure loops
Without an explicit backoffLimit, k8s will retry the Job for ~2 h. Setting it to 1 gives faster feedback during local dev.

 spec:
   template:
     spec:
       …
   restartPolicy: "Never"
+  backoffLimit: 1

26-27: Fix double space after colon

-            - name:  "scripts-volume"
+            - name: "scripts-volume"

Resolves colons lint error.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5bae0ce and 80ec62d.

⛔ Files ignored due to path filters (2)
  • yscope-k8s/demo-assets/mongod-256MB-presto-clp.log.tar.gz is excluded by !**/*.gz
  • yscope-k8s/demo-assets/presto-cli-0.293-executable.jar is excluded by !**/*.jar
📒 Files selected for processing (9)
  • yscope-k8s/.gitignore (1 hunks)
  • yscope-k8s/.helmignore (1 hunks)
  • yscope-k8s/README.md (1 hunks)
  • yscope-k8s/demo-assets/clp-config.yml.bak (1 hunks)
  • yscope-k8s/demo-assets/init.sh (1 hunks)
  • yscope-k8s/templates/object-store/bucket-creation.yaml (1 hunks)
  • yscope-k8s/templates/object-store/minio.yaml (1 hunks)
  • yscope-k8s/templates/presto/presto-coordinator.yaml (1 hunks)
  • yscope-k8s/values.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
yscope-k8s/values.yaml (1)
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
yscope-k8s/README.md (1)
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
yscope-k8s/templates/object-store/bucket-creation.yaml (1)
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
yscope-k8s/templates/presto/presto-coordinator.yaml (1)
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
🪛 YAMLlint (1.37.1)
yscope-k8s/values.yaml

[error] 12-12: too many blank lines (1 > 0)

(empty-lines)

yscope-k8s/templates/object-store/minio.yaml

[error] 4-4: too many spaces inside braces

(braces)


[error] 4-4: too many spaces inside braces

(braces)


[error] 5-5: too many spaces inside braces

(braces)


[error] 5-5: too many spaces inside braces

(braces)


[error] 8-8: too many spaces inside braces

(braces)


[error] 8-8: too many spaces inside braces

(braces)


[error] 20-20: too many spaces inside empty braces

(braces)


[error] 25-25: too many spaces inside braces

(braces)


[error] 25-25: too many spaces inside braces

(braces)


[error] 27-27: too many spaces inside braces

(braces)


[error] 27-27: too many spaces inside braces

(braces)


[error] 32-32: too many spaces inside braces

(braces)


[error] 32-32: too many spaces inside braces

(braces)


[error] 33-33: too many spaces inside braces

(braces)


[error] 33-33: too many spaces inside braces

(braces)


[error] 35-35: too many spaces inside braces

(braces)


[error] 35-35: too many spaces inside braces

(braces)


[error] 36-36: too many spaces inside braces

(braces)


[error] 36-36: too many spaces inside braces

(braces)


[error] 38-38: too many spaces inside braces

(braces)


[error] 38-38: too many spaces inside braces

(braces)


[error] 38-38: no new line character at the end of file

(new-line-at-end-of-file)

yscope-k8s/templates/object-store/bucket-creation.yaml

[error] 26-26: too many spaces after colon

(colons)

yscope-k8s/templates/presto/presto-coordinator.yaml

[error] 51-51: too many spaces inside braces

(braces)


[error] 51-51: too many spaces inside braces

(braces)


[error] 52-52: too many spaces inside braces

(braces)


[error] 52-52: too many spaces inside braces

(braces)


[error] 75-75: too many blank lines (1 > 0)

(empty-lines)

🪛 LanguageTool
yscope-k8s/README.md

[uncategorized] ~45-~45: Possible missing article found.
Context: ...hat the worker can find the archives in right place: ```bash # Install mysql-client i...

(AI_HYDRA_LEO_MISSING_THE)


[uncategorized] ~77-~77: Possible missing comma found.
Context: ...al laptop, to access the Presto's WebUI by e.g., http://localhost:8080 To use pre...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.17.2)
yscope-k8s/README.md

26-26: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


26-26: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


27-27: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


33-33: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


34-34: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


39-39: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1

(MD029, ol-prefix)


39-39: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


40-40: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


45-45: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1

(MD029, ol-prefix)


45-45: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


46-46: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


53-53: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


53-53: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


55-55: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


59-59: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


59-59: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


60-60: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above

(MD022, blanks-around-headings)


60-60: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


71-71: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


71-71: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)


73-73: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


77-77: Bare URL used

(MD034, no-bare-urls)


80-80: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


85-85: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


85-85: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


89-89: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


90-90: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


94-94: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


94-94: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


95-95: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


99-99: Multiple consecutive blank lines
Expected: 1; Actual: 2

(MD012, no-multiple-blanks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)

Comment on lines +32 to +38
webui:
host: "localhost"
port: 6000
logging_level: "INFO"
log_viewer_webui:
host: "localhost"
port: 6006
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider parameterizing webui and log_viewer_webui hosts as well

Most services rely on the ${REPLACE_IP} placeholder, but the two UI blocks are hard-coded to localhost. If the cluster is accessed remotely (e.g., via kubectl port-forward or an ingress), these addresses may become invalid. Exposing them as the same placeholder keeps the template fully relocatable.

-webui:
-  host: "localhost"
+webui:
+  host: "${REPLACE_IP}"
 ...
-log_viewer_webui:
-  host: "localhost"
+log_viewer_webui:
+  host: "${REPLACE_IP}"
🤖 Prompt for AI Agents
In yscope-k8s/demo-assets/clp-config.yml.bak around lines 32 to 38, the webui
and log_viewer_webui host values are hardcoded as "localhost". To make the
configuration relocatable and consistent with other services using the
${REPLACE_IP} placeholder, replace the "localhost" strings with the
${REPLACE_IP} placeholder for both webui and log_viewer_webui host entries.

@@ -0,0 +1 @@
demo-assets/clp-config.yml
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add companion ignore rules for derived or editor artefacts

You already ignore the generated clp-config.yml. It may be worth also excluding its backup variants and common editor swap files to keep the repo clean:

 demo-assets/clp-config.yml
+demo-assets/clp-config.yml.bak
+*.swp
+*.tmp
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
demo-assets/clp-config.yml
demo-assets/clp-config.yml
demo-assets/clp-config.yml.bak
*.swp
*.tmp
🤖 Prompt for AI Agents
In yscope-k8s/.gitignore at line 1, add ignore rules for backup and editor swap
files related to clp-config.yml, such as clp-config.yml~, .clp-config.yml.swp,
and other common temporary files, to keep the repository clean from derived or
editor artefacts.

Comment on lines +13 to +15
cp "${FILE}.bak" "$FILE"

sed -i "s|\${REPLACE_IP}|$IP|g" "$FILE"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

sed -i is not portable across macOS vs GNU sed

The current one-argument form works on GNU sed; macOS requires an empty string after -i. Consider a small portability shim:

if sed --version >/dev/null 2>&1; then
  sed -i "s|${REPLACE_IP}|$IP|g" "$FILE"
else
  sed -i '' "s|${REPLACE_IP}|$IP|g" "$FILE"
fi

This prevents the script from failing on contributors’ Macs.

🤖 Prompt for AI Agents
In yscope-k8s/demo-assets/init.sh around lines 13 to 15, the sed command uses
the -i option without a backup suffix, which is not portable between GNU sed and
macOS sed. To fix this, add a conditional check for GNU sed by running sed
--version and use the one-argument -i option if true; otherwise, use the
macOS-compatible form with an empty string after -i. Replace the current sed
line with this conditional logic to ensure compatibility on both Linux and macOS
systems.

Comment on lines +1 to +2
#!/usr/bin/env bash

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enable safer Bash defaults

Fail fast on any unexpected error or unset variable to avoid silently producing a partially-edited config.

 #!/usr/bin/env bash
+set -euo pipefail
🤖 Prompt for AI Agents
In yscope-k8s/demo-assets/init.sh at lines 1 to 2, the script lacks safer Bash
defaults that ensure it fails fast on errors or unset variables. Add "set -euo
pipefail" after the shebang line to enable these safer defaults, which will
cause the script to exit immediately on errors, unset variables, or failed
pipeline commands.

Comment on lines +11 to +12
replicaCount: 1

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Remove superfluous blank line to satisfy YAMLlint

     replicaCount: 1
-
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
replicaCount: 1
replicaCount: 1
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 12-12: too many blank lines (1 > 0)

(empty-lines)

🤖 Prompt for AI Agents
In yscope-k8s/values.yaml at lines 11 to 12, there is an unnecessary blank line
after the replicaCount setting. Remove this extra blank line to comply with
YAMLlint formatting rules and ensure the file is properly formatted.

Comment on lines 1 to 7
kind: "Pod"
metadata:
labels:
app: {{ .Values.objectStore.minio.serviceName }}
name: {{ .Values.objectStore.minio.serviceName }}
spec:
containers:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

apiVersion field is missing for the Pod – manifest will fail to apply

Every Kubernetes resource requires apiVersion. Add it above kind:

+apiVersion: v1
 kind: "Pod"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
kind: "Pod"
metadata:
labels:
app: {{ .Values.objectStore.minio.serviceName }}
name: {{ .Values.objectStore.minio.serviceName }}
spec:
containers:
apiVersion: v1
kind: "Pod"
metadata:
labels:
app: {{ .Values.objectStore.minio.serviceName }}
name: {{ .Values.objectStore.minio.serviceName }}
spec:
containers:
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 4-4: too many spaces inside braces

(braces)


[error] 4-4: too many spaces inside braces

(braces)


[error] 5-5: too many spaces inside braces

(braces)


[error] 5-5: too many spaces inside braces

(braces)

🤖 Prompt for AI Agents
In yscope-k8s/templates/object-store/minio.yaml at lines 1 to 7, the Kubernetes
Pod manifest is missing the required apiVersion field. Add the appropriate
apiVersion line above the kind field to specify the API version for the Pod
resource, ensuring the manifest can be applied successfully.

Comment on lines +31 to +37
3. Replace the content of `/path/to/clp-json-package/etc/clp-config.yml` with the output of `demo-assets/init.sh <ip_addr>` where the `<ip_addr>` is the IP address of the host that you are running the clp-package.

4. Launch:
```bash
# You probably want to run in a 3.11 python environment
sbin/start-clp.sh
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Insert blank lines around list items and fenced blocks
Lists (1.6.) and the following fenced bash blocks are missing the mandatory blank line before/after, triggering MD031/MD032 and rendering inconsistently.

 3. Replace the content …
-4. Launch:
-```bash
+# 
+4. Launch:
+
+```bash

Apply the same pattern to the other ordered-list sections.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

33-33: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


34-34: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🤖 Prompt for AI Agents
In yscope-k8s/README.md around lines 31 to 37, the ordered list items and the
fenced bash code blocks lack the required blank lines before and after, causing
markdown linting errors MD031 and MD032 and inconsistent rendering. Add a blank
line before and after each list item and fenced code block, including the one
before the "4. Launch:" line and the fenced bash block following it. Apply this
spacing pattern consistently to all other ordered list sections in the file.

@@ -0,0 +1,104 @@
# Setup local K8s cluster for presto + clp
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Keep a single H1 and demote the others
The doc currently defines five #-level headings, which violates MD025 and breaks the logical outline. Only the title should stay at #; demote the rest to ##.

-# Launch clp-package
+# ## Launch clp-package
...
-# Create k8s Cluster
+# ## Create k8s Cluster
...
-# Working with helm chart
+# ## Working with helm chart
...
-# Delete k8s Cluster
+# ## Delete k8s Cluster

Also applies to: 26-26, 53-53, 59-59, 94-94

🤖 Prompt for AI Agents
In yscope-k8s/README.md at lines 1, 26, 53, 59, and 94, the markdown headings
use multiple H1 (#) levels which violates MD025 and disrupts the document
structure. Keep only the main title at H1 (#) and change all other headings to
H2 (##) or lower as appropriate to maintain a proper logical outline.

Comment on lines +45 to +51
6. Use the following command to update the CLP metadata database so that the worker can find the archives in right place:
```bash
# Install mysql-client if necessary
sudo apt update && sudo apt install -y mysql-client
# Find the user and password in /path/to/clp-json-package/etc/credential.yml
mysql -h ${REPLACE_IP} -P 6001 -u ${REPLACE_USER} -p'${REPLACE_PASSWORD}' clp-db -e "UPDATE clp_datasets SET archive_storage_directory = '/var/data/archives/default';"
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Minor grammar fix for clarity

-mysql … -e "UPDATE clp_datasets SET archive_storage_directory = '/var/data/archives/default';"
+mysql … -e "UPDATE clp_datasets SET archive_storage_directory = '/var/data/archives/default';"
                                               ^ add “the”

“…find the archives in the right place”

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~45-~45: Possible missing article found.
Context: ...hat the worker can find the archives in right place: ```bash # Install mysql-client i...

(AI_HYDRA_LEO_MISSING_THE)

🪛 markdownlint-cli2 (0.17.2)

45-45: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1

(MD029, ol-prefix)


45-45: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


46-46: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🤖 Prompt for AI Agents
In yscope-k8s/README.md around lines 45 to 51, update the phrase "find the
archives in right place" to "find the archives in the right place" by adding the
missing article "the" for grammatical correctness and clarity.

Comment on lines +37 to +39
- name: "clp-archives-volume"
hostPath:
path: "/var/data/archives"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

hostPath ties the pod to a single node
Mounting /var/data/archives via hostPath blocks pod rescheduling and breaks multi-node clusters. Consider a PersistentVolumeClaim or an emptyDir seeded by an init-container copy for portability.

🤖 Prompt for AI Agents
In yscope-k8s/templates/presto/presto-coordinator.yaml around lines 37 to 39,
the use of hostPath for the clp-archives-volume ties the pod to a single node,
preventing rescheduling and harming multi-node cluster portability. Replace the
hostPath volume with a PersistentVolumeClaim to allow dynamic storage allocation
across nodes, or alternatively use an emptyDir volume combined with an
init-container to seed data, ensuring the pod can be rescheduled on any node
without dependency on local paths.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
yscope-k8s/.helmignore (1)

26-28: Add explicit patterns for the large binaries referenced in prior reviews

The two heavyweight artefacts (mongod-256MB-presto-clp.log.tar.gz and presto-cli-0.293-executable.jar) discussed in earlier feedback are still not excluded. Shipping them inside the chart will bloat every helm package / helm install and noticeably slow CI/CD pipelines.

 # Demo assets
 /demo-assets/
+# Large binaries – keep the chart lean
+mongod-*-presto-*.tar.gz
+presto-cli-*.jar
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 80ec62d and e40f329.

📒 Files selected for processing (1)
  • yscope-k8s/.helmignore (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)

Comment on lines +4 to +19
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Minor style nit – drop the leading dots for Helm-relative paths

Unlike .gitignore, Helm’s ignore patterns are evaluated from the chart root, so foo/ is sufficient—there’s no need for ./foo or .foo. Tidying this up is optional but keeps the list consistent.

🤖 Prompt for AI Agents
In yscope-k8s/.helmignore between lines 4 and 19, remove the leading dots from
the directory and file patterns to match Helm's ignore pattern style, which
evaluates paths relative to the chart root. For example, change '.git/' to
'git/' and '.gitignore' to 'gitignore'. This adjustment tidies up the file and
aligns it with Helm's expected pattern format.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (1)
yscope-k8s/values.yaml (1)

12-13: Remove superfluous trailing blank line

YAMLlint flags the empty line after the document. Delete it to satisfy linting and keep the file tidy.

-    replicas: 1
-
+    replicas: 1
♻️ Duplicate comments (4)
yscope-k8s/values.yaml (1)

11-12: Standardise key name to match upstream charts

replicaCount is still camel-cased, whereas upstream Presto charts (and Kubernetes conventions) use the flat replicas. Keeping the same key name eases value-file sharing and tooling.

-    replicaCount: 1
+    replicas: 1
yscope-k8s/templates/presto/presto-coordinator-config.yaml (1)

9-13: Do not hard-code DB credentials in ConfigMap – move to a Secret

Storing ${REPLACE_ME} placeholders still encourages users to commit real credentials later. Helm can template Secrets and mount them via envFrom.secretKeyRef, eliminating the risk.

Example fix:

-    clp.metadata-db-user=${REPLACE_ME}
-    clp.metadata-db-password=${REPLACE_ME}
+    clp.metadata-db-user=${DB_USER}
+    clp.metadata-db-password=${DB_PASSWORD}

…and create:

apiVersion: v1
kind: Secret
metadata:
  name: clp-mysql-auth
type: Opaque
stringData:
  DB_USER:  "clp_user"
  DB_PASSWORD: "super-secret"

Then mount:

envFrom:
  - secretRef:
      name: clp-mysql-auth
yscope-k8s/README.md (2)

26-60: Multiple top-level # headings violate MD025

Only the document title should be H1. Demote subsequent headings to ## (or deeper) and add blank lines before/after to silence MD022/MD032.

-# Launch clp-package
+# ## Launch clp-package
...
-# Create k8s Cluster
+# ## Create k8s Cluster
...
-# Working with helm chart
+# ## Working with helm chart

45-51: Missing article “the”

“…find the archives in the right place”

Tiny grammar tweak improves readability.

- the archives in right place
+ the archives in the right place
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e40f329 and 8aa5cc1.

📒 Files selected for processing (3)
  • yscope-k8s/README.md (1 hunks)
  • yscope-k8s/templates/presto/presto-coordinator-config.yaml (1 hunks)
  • yscope-k8s/values.yaml (1 hunks)
🧠 Learnings (4)
📓 Common learnings
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
yscope-k8s/templates/presto/presto-coordinator-config.yaml (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

yscope-k8s/values.yaml (1)

Learnt from: anlowee
PR: #7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.

yscope-k8s/README.md (1)

Learnt from: anlowee
PR: #7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.

🪛 YAMLlint (1.37.1)
yscope-k8s/values.yaml

[error] 12-12: too many blank lines (1 > 0)

(empty-lines)

🪛 LanguageTool
yscope-k8s/README.md

[uncategorized] ~45-~45: Possible missing article found.
Context: ...hat the worker can find the archives in right place: ```bash # Install mysql-client i...

(AI_HYDRA_LEO_MISSING_THE)


[uncategorized] ~80-~80: Possible missing comma found.
Context: ...al laptop, to access the Presto's WebUI by e.g., http://localhost:8080 To use pre...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.17.2)
yscope-k8s/README.md

26-26: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


26-26: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


27-27: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


33-33: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


34-34: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


39-39: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1

(MD029, ol-prefix)


39-39: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


40-40: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


45-45: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1

(MD029, ol-prefix)


45-45: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


46-46: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


53-53: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


53-53: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


55-55: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


59-59: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


59-59: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


60-60: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above

(MD022, blanks-around-headings)


60-60: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


62-62: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


74-74: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


74-74: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)


76-76: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


80-80: Bare URL used

(MD034, no-bare-urls)


83-83: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


88-88: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


88-88: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


92-92: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


93-93: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


97-97: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


97-97: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


98-98: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


102-102: Multiple consecutive blank lines
Expected: 1; Actual: 2

(MD012, no-multiple-blanks)

🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
yscope-k8s/templates/presto/presto-coordinator-config.yaml (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

yscope-k8s/values.yaml (1)

Learnt from: anlowee
PR: #7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.

yscope-k8s/README.md (1)

Learnt from: anlowee
PR: #7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.

🪛 YAMLlint (1.37.1)
yscope-k8s/values.yaml

[error] 12-12: too many blank lines (1 > 0)

(empty-lines)

🪛 LanguageTool
yscope-k8s/README.md

[uncategorized] ~45-~45: Possible missing article found.
Context: ...hat the worker can find the archives in right place: ```bash # Install mysql-client i...

(AI_HYDRA_LEO_MISSING_THE)


[uncategorized] ~80-~80: Possible missing comma found.
Context: ...al laptop, to access the Presto's WebUI by e.g., http://localhost:8080 To use pre...

(AI_HYDRA_LEO_MISSING_COMMA)

🪛 markdownlint-cli2 (0.17.2)
yscope-k8s/README.md

26-26: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


26-26: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


27-27: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


33-33: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


34-34: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


39-39: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1

(MD029, ol-prefix)


39-39: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


40-40: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


45-45: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1

(MD029, ol-prefix)


45-45: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


46-46: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


53-53: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


53-53: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


55-55: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


59-59: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


59-59: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


60-60: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Above

(MD022, blanks-around-headings)


60-60: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


62-62: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


74-74: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


74-74: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)


76-76: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


80-80: Bare URL used

(MD034, no-bare-urls)


83-83: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


88-88: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


88-88: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


92-92: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


93-93: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


97-97: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below

(MD022, blanks-around-headings)


97-97: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


98-98: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


102-102: Multiple consecutive blank lines
Expected: 1; Actual: 2

(MD012, no-multiple-blanks)

Comment on lines +33 to +48
4. Launch:
```bash
# You probably want to run in a 3.11 python environment
sbin/start-clp.sh
```

5. Compress:
```bash
# You can also use your own dataset
sbin/compress.sh --timestamp-key 't.dollar_sign_date' datasets/mongod-256MB-processed.log
```

6. Use the following command to update the CLP metadata database so that the worker can find the archives in right place:
```bash
# Install mysql-client if necessary
sudo apt update && sudo apt install -y mysql-client
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Surround list items and fenced blocks with blank lines

Several ordered-list steps (e.g., 3–6) butt directly against code fences, triggering MD031/MD032 and rendering oddly. Insert a blank line before and after each fenced block and between list items.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~45-~45: Possible missing article found.
Context: ...hat the worker can find the archives in right place: ```bash # Install mysql-client i...

(AI_HYDRA_LEO_MISSING_THE)

🪛 markdownlint-cli2 (0.17.2)

33-33: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


34-34: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


39-39: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1

(MD029, ol-prefix)


39-39: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


40-40: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


45-45: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1

(MD029, ol-prefix)


45-45: Lists should be surrounded by blank lines

(MD032, blanks-around-lists)


46-46: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

🤖 Prompt for AI Agents
In yscope-k8s/README.md between lines 33 and 48, the ordered list items and
fenced code blocks are not separated by blank lines, causing markdown rendering
issues. Add a blank line before and after each fenced code block and between
each list item to ensure proper spacing and correct markdown formatting.

Comment on lines +88 to +90
```
SELECT * FROM default LIMIT 1;
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add language hint to SQL block

The example query lacks a language tag, tripping MD040 and losing syntax highlighting.

-```
+```sql
 SELECT * FROM default LIMIT 1;

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

88-88: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)

---

88-88: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

In yscope-k8s/README.md around lines 88 to 90, the SQL code block is missing a
language hint, causing markdown linting errors and no syntax highlighting. Add
the language tag "sql" immediately after the opening triple backticks to enable
proper syntax highlighting and fix the MD040 lint error.


</details>

<!-- fingerprinting:phantom:triton:cougar -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +33 to +35
-server
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
-Xmx4G
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Open JDWP port ships in production image

-agentlib:jdwp=…address=*:5005 exposes a remote debugger on every coordinator pod. Unless this chart is strictly for local dev, delete the flag or guard it behind a .Values.debug.enabled toggle to avoid accidental exposure.


🏁 Script executed:

#!/bin/bash
# Check for JDWP flag in the Presto coordinator config
rg "agentlib:jdwp" -n yscope-k8s/templates/presto/presto-coordinator-config.yaml

# Look for an existing debug toggle in the chart values or templates
rg "\.Values\.debug" -n .

Length of output: 184


🏁 Script executed:

#!/bin/bash
# Search for any debug-related toggles in the chart values
rg "^[[:space:]]*debug" -n yscope-k8s/values.yaml

# Search for debug references across all templates
rg "debug" -n yscope-k8s/templates

Length of output: 86


Guard the JDWP JVM option behind a Helm debug toggle

Prevent accidental exposure of the remote debugger by wrapping the -agentlib:jdwp flag in a conditional and defaulting debug to false.

• File: yscope-k8s/templates/presto/presto-coordinator-config.yaml (line 34)
Wrap the JDWP option in an if .Values.debug.enabled block.
• File: yscope-k8s/values.yaml
Add a debug.enabled entry set to false by default.

Suggested diff:

@@ -32,6 +32,12 @@
     -server
-    -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
     -Xmx4G
+
+    {{- if .Values.debug.enabled }}
+    -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
+    {{- end }}
+
# values.yaml
 debug:
+  enabled: false
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
-server
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
-Xmx4G
-server
{{- if .Values.debug.enabled }}
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
{{- end }}
-Xmx4G
🤖 Prompt for AI Agents
In yscope-k8s/templates/presto/presto-coordinator-config.yaml around lines 33 to
35, the JVM option for JDWP debugging is always enabled, which risks exposing
the remote debugger unintentionally. Wrap the line containing
'-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005' inside a
Helm conditional block that checks if '.Values.debug.enabled' is true.
Additionally, in yscope-k8s/values.yaml, add a new entry 'debug.enabled' and set
its default value to false to disable debugging by default.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 23

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8aa5cc1 and c8666a4.

📒 Files selected for processing (18)
  • yscope-compose/.env (1 hunks)
  • yscope-compose/README.md (1 hunks)
  • yscope-compose/coordinator/config-template/clp.properties (1 hunks)
  • yscope-compose/coordinator/config-template/config.properties (1 hunks)
  • yscope-compose/coordinator/config-template/jvm.config (1 hunks)
  • yscope-compose/coordinator/config-template/log.properties (1 hunks)
  • yscope-compose/coordinator/config-template/metadata-filter.json (1 hunks)
  • yscope-compose/coordinator/config-template/node.properties (1 hunks)
  • yscope-compose/coordinator/scripts/$ (1 hunks)
  • yscope-compose/coordinator/scripts/generate-configs.sh (1 hunks)
  • yscope-compose/demo-assets/clp-config.yml (1 hunks)
  • yscope-compose/demo-assets/init.sh (1 hunks)
  • yscope-compose/docker-compose.yaml (1 hunks)
  • yscope-compose/worker/config-template/clp.properties (1 hunks)
  • yscope-compose/worker/config-template/config.properties (1 hunks)
  • yscope-compose/worker/config-template/node.properties (1 hunks)
  • yscope-compose/worker/config-template/velox.properties (1 hunks)
  • yscope-compose/worker/scripts/generate-configs.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: anlowee
PR: y-scope/presto#7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
yscope-compose/worker/config-template/node.properties (1)

Learnt from: anlowee
PR: #7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.

yscope-compose/worker/config-template/config.properties (1)

Learnt from: anlowee
PR: #7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.

yscope-compose/coordinator/config-template/clp.properties (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

yscope-compose/coordinator/config-template/config.properties (1)

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

yscope-compose/coordinator/scripts/generate-configs.sh (1)

Learnt from: anlowee
PR: #7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.

yscope-compose/coordinator/scripts/$ (1)

Learnt from: anlowee
PR: #7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.

yscope-compose/docker-compose.yaml (1)

Learnt from: anlowee
PR: #7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.

yscope-compose/.env (2)

Learnt from: anlowee
PR: #7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.

Learnt from: wraymo
PR: #15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: ClpMetadataProvider is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

yscope-compose/worker/scripts/generate-configs.sh (1)

Learnt from: anlowee
PR: #7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.

🪛 YAMLlint (1.37.1)
yscope-compose/demo-assets/clp-config.yml

[error] 39-39: too many blank lines (1 > 0)

(empty-lines)

yscope-compose/docker-compose.yaml

[error] 39-39: too many blank lines (1 > 0)

(empty-lines)

🪛 markdownlint-cli2 (0.17.2)
yscope-compose/README.md

7-7: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


22-22: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1

(MD029, ol-prefix)


29-29: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1

(MD029, ol-prefix)


35-35: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


44-44: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


44-44: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


48-48: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


48-48: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)


61-61: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


65-65: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


71-71: Multiple consecutive blank lines
Expected: 1; Actual: 2

(MD012, no-multiple-blanks)


72-72: Multiple consecutive blank lines
Expected: 1; Actual: 3

(MD012, no-multiple-blanks)

🪛 Shellcheck (0.10.0)
yscope-compose/coordinator/scripts/generate-configs.sh

[info] 10-10: Double quote to prevent globbing and word splitting.

(SC2086)

yscope-compose/demo-assets/init.sh

[warning] 6-6: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 20-20: Declare and assign separately to avoid masking return values.

(SC2155)


[info] 20-20: Double quote to prevent globbing and word splitting.

(SC2086)


[warning] 21-21: Declare and assign separately to avoid masking return values.

(SC2155)


[info] 21-21: Double quote to prevent globbing and word splitting.

(SC2086)


[info] 39-39: Double quote to prevent globbing and word splitting.

(SC2086)

yscope-compose/worker/scripts/generate-configs.sh

[info] 10-10: Double quote to prevent globbing and word splitting.

(SC2086)


[warning] 81-81: In POSIX sh, HOSTNAME is undefined.

(SC3028)


[warning] 82-82: In POSIX sh, HOSTNAME is undefined.

(SC3028)

🪛 dotenv-linter (3.3.0)
yscope-compose/.env

[warning] 2-2: [QuoteCharacter] The value has quote characters (', ")


[warning] 3-3: [QuoteCharacter] The value has quote characters (', ")


[warning] 6-6: [QuoteCharacter] The value has quote characters (', ")


[warning] 7-7: [QuoteCharacter] The value has quote characters (', ")


[warning] 7-7: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE key


[warning] 8-8: [QuoteCharacter] The value has quote characters (', ")


[warning] 8-8: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL key


[warning] 9-9: [QuoteCharacter] The value has quote characters (', ")


[warning] 9-9: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE key


[warning] 10-10: [QuoteCharacter] The value has quote characters (', ")


[warning] 10-10: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL key


[warning] 11-11: [QuoteCharacter] The value has quote characters (', ")


[warning] 12-12: [QuoteCharacter] The value has quote characters (', ")


[warning] 15-15: [QuoteCharacter] The value has quote characters (', ")


[warning] 16-16: [QuoteCharacter] The value has quote characters (', ")


[warning] 19-19: [QuoteCharacter] The value has quote characters (', ")


[warning] 20-20: [QuoteCharacter] The value has quote characters (', ")


[warning] 20-20: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_JVMCONFIG_G1HEAPREGIONSIZE key should go before the PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE key


[warning] 23-23: [QuoteCharacter] The value has quote characters (', ")


[warning] 26-26: [QuoteCharacter] The value has quote characters (', ")


[warning] 29-29: [QuoteCharacter] The value has quote characters (', ")


[warning] 32-32: [QuoteCharacter] The value has quote characters (', ")

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (11)
yscope-compose/worker/config-template/velox.properties (1)

1-1: Double-check that the property key is recognised by Velox

mutable-config=true is not a standard Presto worker property; it appears Velox-specific. Confirm that the Velox plugin actually consumes this key (and that the init script does not overwrite it). If not required, drop it to avoid silent misconfiguration.

yscope-compose/worker/config-template/clp.properties (1)

1-1: LGTM – minimal but sufficient

A single‐line connector declaration is all that’s needed here.

yscope-compose/worker/config-template/node.properties (1)

1-5: Configuration template looks good.

The node properties template correctly defines the essential Presto worker node configuration with appropriate environment variable substitution and placeholders for dynamic values.

yscope-compose/coordinator/scripts/$ (1)

1-19: Verify the filename - appears to be incorrectly named.

The file appears to be named $ but based on the content and AI summary, this should likely be generate-configs.sh. Please ensure the filename is correct.

The script logic itself is sound for processing configuration templates and setting up the Presto catalog directory structure.

yscope-compose/coordinator/config-template/log.properties (1)

1-2: Logging configuration is properly structured.

The log properties template correctly configures Presto logging with environment variable-driven log level control.

yscope-compose/coordinator/config-template/jvm.config (1)

1-10: JVM configuration follows best practices.

The JVM configuration is well-structured with appropriate G1GC settings, memory management options, and environment variable-driven customization suitable for Presto coordinator deployments.

yscope-compose/coordinator/config-template/clp.properties (1)

2-8: Verify long environment-variable names & keep naming consistent

A few placeholders mix camel-case and all-caps (…_METADATA_PROVIDERTYPE vs …_SPLITPROVIDER). Double-check that

  1. the actual variable names are exported exactly as written, and
  2. the spelling (SPLITPROVIDER vs SPLIT_PROVIDER_TYPE, missing _TYPE suffix) matches what ClpConnectorFactory expects at runtime.
    Small mismatches silently fall back to defaults and are painful to debug later.
yscope-compose/coordinator/config-template/config.properties (1)

8-13: Feature-flag defaults diverge from upstream Presto

Upstream keeps optimizer.optimize-hash-generation enabled and inline-sql-functions true. Turning them off changes query plans and may surprise users migrating configs. If this deviation is intentional, add an inline comment explaining why.

yscope-compose/docker-compose.yaml (1)

17-23: Image name inconsistency

Coordinator image path is …/presto/coordinator:dev, worker is …/presto/prestissimo-worker:dev. Confirm the registry actually hosts both tags and that the worker’s path is correct; “prestissimo-worker” looks like an internal staging tag.

yscope-compose/worker/scripts/generate-configs.sh (1)

4-4: apt-get may not exist in the Presto image.

Many lightweight images (Alpine, Distroless, Amazon Linux) ship without apt.
If this script runs in such an image the init container will fail before it even starts.

Confirm that the worker image is Debian-based, or switch to a package manager-agnostic approach (e.g., rely on wget already being in the image, or add it at build time).

yscope-compose/.env (1)

35-35: Reminder: populate CLP_PACKAGE_ARCHIVES.

CLP_PACKAGE_ARCHIVES=REPLACE_ME is required by demo-assets/init.sh.
Leaving it unchanged will break catalogue generation.

Comment on lines +9 to +11
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER="clp-user"
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD="123456"
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_TABLEPREFIX="clp_"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Plain-text credentials committed to VCS.

PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD="123456" exposes a real (or realistic) password.
For public repos consider:

• Replacing with an obvious placeholder
• Loading sensitive values via external .env.local ignored by Git

This helps prevent accidental leaks when the configuration evolves beyond local demos.

🧰 Tools
🪛 dotenv-linter (3.3.0)

[warning] 9-9: [QuoteCharacter] The value has quote characters (', ")


[warning] 9-9: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE key


[warning] 10-10: [QuoteCharacter] The value has quote characters (', ")


[warning] 10-10: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL key


[warning] 11-11: [QuoteCharacter] The value has quote characters (', ")

🤖 Prompt for AI Agents
In yscope-compose/.env around lines 9 to 11, the database password is stored as
a plain-text credential, which risks exposure in version control. Replace the
actual password value with a placeholder string like "your_password_here" and
configure the application to load the real password from an external .env.local
file that is included in .gitignore to prevent committing sensitive data to the
repository.

clp.metadata-db-password=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD}
clp.metadata-table-prefix=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_TABLEPREFIX}
clp.split-provider-type=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_SPLITPROVIDER}
clp.metadata-filter-config=/opt/presto-server/etc/metadata-filter.json
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Hard-coded path may break inside k8s & Docker images

clp.metadata-filter-config=/opt/presto-server/etc/metadata-filter.json assumes the file is copied to the etc dir by the init script. If that copy step is ever skipped, the connector fails at startup. Consider parameterising the path or validating the file in generate-configs.sh.

🤖 Prompt for AI Agents
In yscope-compose/coordinator/config-template/clp.properties at line 9, the path
to metadata-filter-config is hard-coded, which can cause failures if the file is
not copied as expected in k8s or Docker environments. To fix this, parameterize
the path using an environment variable or configuration placeholder instead of a
fixed absolute path, and update generate-configs.sh to validate the existence of
the file before startup to prevent silent failures.

Comment on lines +3 to +7
http-server.http.port=${PRESTO_COORDINATOR_HTTPPORT}
query.max-memory=${PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORY}
query.max-memory-per-node=${PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORYPERNODE}
discovery-server.enabled=true
discovery.uri=http://${PRESTO_COORDINATOR_SERVICENAME}:${PRESTO_COORDINATOR_HTTPPORT}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Potential port clash when multiple coordinators run on one host

http-server.http.port=${PRESTO_COORDINATOR_HTTPPORT} is reused in discovery.uri. If you spin up several dev clusters on the same machine, they’ll all default to the same port and only the first will start. Adding an optional offset or allowing ${COMPOSE_PROJECT_NAME} in the env file prevents this clash.

🤖 Prompt for AI Agents
In yscope-compose/coordinator/config-template/config.properties lines 3 to 7,
the http-server.http.port and discovery.uri use the same environment variable
PRESTO_COORDINATOR_HTTPPORT, causing port conflicts when multiple coordinators
run on one host. To fix this, modify the port assignment to include an optional
offset or incorporate the ${COMPOSE_PROJECT_NAME} variable in the environment
file to generate unique port values per instance, preventing clashes.

Comment on lines +1 to +3
{
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Empty config likely to be ignored – add a placeholder or explanatory comment

Presto’s metadata filter expects an explicit JSON structure (e.g., an array of regex filters). Committing an empty object without explanatory context makes it hard for operators to know whether the file is intentionally empty or incomplete.

Consider adding at least a minimal placeholder field or inline comment to clarify intent, or remove the file until a real filter is needed.


🌐 Web query:

Presto Hive connector metadata filter JSON structure example

💡 Result:

A Presto/Trino Hive connector metadata filter JSON structure typically defines which databases, schemas, tables, and columns to include or exclude when ingesting metadata, most often for use with metadata tools like DataHub or OpenMetadata. The filter configuration is usually provided as part of a larger ingestion recipe or JSON configuration object.

Below is an example of such a JSON structure based on common patterns observed in Presto/Trino Hive connector and metadata tool documentation[1][2][6][7]:

{
  "source": {
    "type": "presto",
    "serviceName": "my_presto",
    "host_port": "presto.example.com:8080",
    "database": "hive",
    "filter": {
      "database_pattern": {
        "allow": ["^analytics$", "^sales_.*"],
        "deny": ["^test_.*"]
      },
      "schema_pattern": {
        "allow": ["^public$", "^prod_.*"],
        "deny": ["^temp_.*"]
      },
      "table_pattern": {
        "allow": ["^.*_log$", "^user_data$"],
        "deny": ["^deprecated_.*"]
      },
      "column_pattern": {
        "deny": ["^debug_.*"]
      }
    }
  }
}

Key Fields and Usage:

  • database_pattern: Regex to allow or deny databases from ingestion. For example, only databases called "analytics" or starting with "sales_" are included[1][2].
  • schema_pattern: Controls which schemas are included or excluded.
  • table_pattern: Regexes for allowing or denying tables—e.g., include tables ending with "log" or named "user_data," but exclude those starting with "deprecated"[1].
  • column_pattern: Optionally filter columns (rarely used for Presto/Hive, but some frameworks support it).

Alternate/Additional Fields:

  • tables_where_clause_suffix / schemas_where_clause_suffix: Sometimes, SQL-style WHERE clause fragments are used to filter which tables or schemas are discovered[1].
  • view_pattern: If supported, similar pattern-matching for views.
  • catalog_to_connector_details: In Trino/Presto, advanced setups may map catalogs to detailed connector info, but this is for lineage, not filtering[2][6].

Ingestion tools like DataHub and OpenMetadata may use similar but not identical JSON, sometimes within a YAML configuration, and sometimes with field names such as pattern or filter_pattern[1][7].

Caveats:

  • The above is a generic example derived from DataHub, OpenMetadata, and Presto/Trino connector patterns. The exact field names and structure may vary based on the ingestion tool or connector version in use[1][2][6][7].
  • Presto/Trino do not natively support metadata filtering via JSON to the connector itself—these patterns are mostly for ingestion/configuration tooling, not for the connector’s internal configuration[1][3].

If you are using a specific tool (like DataHub, OpenMetadata, or custom ingestion scripts), consult its documentation for the exact schema required. The structure shown here is widely compatible for such use cases.

Citations:


Add a minimal placeholder to metadata‐filter.json
Empty JSON objects are ignored by Presto’s ingestion tooling, so operators can’t tell if the file is intentional. Please include at least a stubbed filter block (or a _comment field) to clarify intent.

• File: yscope-compose/coordinator/config-template/metadata-filter.json

Suggested diff:

- {}
+ {
+   "filter": {
+     "database_pattern": {
+       "allow": ["^.*$"],    // allow all databases by default
+       "deny": []            // no databases denied
+     }
+   }
+ }
🤖 Prompt for AI Agents
In yscope-compose/coordinator/config-template/metadata-filter.json at lines 1 to
3, the JSON file is currently empty, which causes Presto’s ingestion tooling to
ignore it and leaves operators uncertain if the file is intentional. To fix
this, add a minimal placeholder such as a stubbed "filter" block or a "_comment"
field inside the JSON object to clarify the file's purpose and ensure it is
recognized by the tooling.

Comment on lines +1 to +2
node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
node.id=${PRESTO_COORDINATOR_SERVICENAME}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Fail-fast if env vars are missing

Both placeholders rely on environment variables that, if unset, will propagate literally into the running coordinator and cause a startup failure that is hard to diagnose. Add default values in the template or make the generate-configs.sh script fail with a clear error when the vars are absent.


🏁 Script executed:

#!/bin/bash
# Locate any generate-configs.sh scripts in the repository
fd generate-configs.sh -t f || exit 0

# For each one found, print its path and the first 50 lines to see how env vars are handled
for script in $(fd generate-configs.sh -t f); do
  echo "---- Inspecting $script ----"
  head -n 50 "$script"
  echo
done

Length of output: 2750


Add explicit checks for missing coordinator env vars

The generate-configs.sh in yscope-compose/coordinator/scripts/ pipes your templates through a fresh sh (with only set –e) so unset variables quietly expand to empty strings rather than failing. To avoid deploying a node.properties with blank values, one of the following should be applied:

• In yscope-compose/coordinator/scripts/generate-configs.sh (before doing any substitutions), add fail-fast checks such as:

: "${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT:?Error: PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT must be set}"
: "${PRESTO_COORDINATOR_SERVICENAME:?Error: PRESTO_COORDINATOR_SERVICENAME must be set}"

• Or declare safe defaults in the template (yscope-compose/coordinator/config-template/node.properties), for example:

node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT:-production}
node.id=${PRESTO_COORDINATOR_SERVICENAME:-coordinator}

Either approach will surface a clear error or provide a sensible default rather than let Presto start with empty values.

🤖 Prompt for AI Agents
In yscope-compose/coordinator/config-template/node.properties at lines 1-2, the
environment variables used for node.environment and node.id can expand to empty
strings if unset, causing invalid configuration. To fix this, either update
yscope-compose/coordinator/scripts/generate-configs.sh to add fail-fast checks
for these variables before substitution, or modify the node.properties template
to provide safe default values using the syntax ${VAR:-default}. This ensures
that missing variables either cause a clear error or fallback to sensible
defaults, preventing deployment of blank values.

Comment on lines +29 to +46
6. Use the following command to update `.env`:

```bash
demo-assets/init.sh update_metadata_config /path/to/clp-json-package
```

# Create Docker Cluster

Create a local docker stack:

```bash
docker compose up
```

To create a docker stack with more than 1 worker (e.g., 3 workers):
```
docker compose up --scale presto-worker=3
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Ordered-list numbering is off

The list jumps from 4 → 5 → 6 because of manual numbering. Switch to Markdown auto-numbering (1. for every item) to avoid drift.

-5. Compress:
-6. Use the following command to update `.env`:
+1. Compress  
+2. Update `.env`:

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

29-29: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1

(MD029, ol-prefix)


35-35: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


44-44: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


44-44: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In yscope-compose/README.md between lines 29 and 46, the ordered list numbering
is manually set and causes numbering to jump incorrectly. Change all list item
numbers to "1." to use Markdown's automatic numbering feature, which will ensure
the list numbers correctly without manual updates.

Comment on lines +61 to +65
```sql
SELECT * FROM default LIMIT 1;
```

# Delete docker Cluster
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Trailing punctuation in heading & duplicate H1

Rename # Use cli:## Using the CLI to remove the colon and keep heading levels unique.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

61-61: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


65-65: Multiple top-level headings in the same document

(MD025, single-title, single-h1)

🤖 Prompt for AI Agents
In yscope-compose/README.md around lines 61 to 65, the heading "# Use cli:" has
trailing punctuation and duplicates the H1 level. Rename this heading to "##
Using the CLI" to remove the colon and change it to a second-level heading,
ensuring unique and consistent heading levels.

Comment on lines +1 to +6
discovery.uri=http://${PRESTO_COORDINATOR_SERVICENAME}:${PRESTO_COORDINATOR_HTTPPORT}
presto.version=REPLACE_ME
http-server.http.port=${PRESTO_WORKER_HTTPPORT}
shutdown-onset-sec=1
register-test-functions=false
runtime-metrics-collection-enabled=false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Automate replacement of REPLACE_ME & validate custom keys

presto.version=REPLACE_ME will break the worker if the update step in generate-configs.sh ever fails. Recommend:

  1. Change the placeholder to an obviously invalid value (e.g., UNKNOWN) and have the script exit non-zero if it still finds that value after substitution.
  2. Verify that non-stock keys like register-test-functions and runtime-metrics-collection-enabled are recognised by the intended Presto build; otherwise they will be ignored silently.

Example diff for (1):

-presto.version=REPLACE_ME
+presto.version=UNKNOWN   # replaced by generate-configs.sh

I’ll verify how generate-configs.sh handles the REPLACE_ME placeholder and look for any validation logic:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate the generate-configs.sh script
SCRIPT=$(fd generate-configs.sh | head -n1)
echo "Inspecting: $SCRIPT"

# Check for REPLACE_ME usage
grep -n -C 3 'REPLACE_ME' "$SCRIPT" || echo "⚠️ No occurrence of REPLACE_ME found"

# Check where presto.version is set or validated
grep -n -C 3 'presto.version' "$SCRIPT" || echo "⚠️ No direct handling of presto.version in this script"

Length of output: 632


Automate presto.version substitution and validate custom keys

The generate-configs.sh script currently does not handle the REPLACE_ME placeholder (or presto.version) and will leave it untouched—causing the worker to fail at runtime. To address this:

• In yscope-compose/worker/config-template/config.properties

-presto.version=REPLACE_ME
+presto.version=UNKNOWN   # replaced by generate-configs.sh

• In yscope-compose/coordinator/scripts/generate-configs.sh

  1. After loading $PRESTO_VERSION, replace “UNKNOWN” in generated configs:
    find "$OUTPUT_DIR" -type f -name '*.properties' \
      -exec sed -i "s/UNKNOWN/${PRESTO_VERSION}/g" {} +
  2. Then verify no placeholder remains:
    if grep -R -q 'UNKNOWN' "$OUTPUT_DIR"; then
      echo "Error: presto.version was not populated" >&2
      exit 1
    fi

• Manually confirm that custom keys
register-test-functions
runtime-metrics-collection-enabled
are supported by your Presto build (see Presto’s documentation or code) to avoid silent ignores.

🤖 Prompt for AI Agents
In yscope-compose/worker/config-template/config.properties lines 1 to 6, the
presto.version key is set to the placeholder REPLACE_ME which is not
automatically replaced, causing runtime failures. To fix this, update the
generate-configs.sh script in yscope-compose/coordinator/scripts to replace the
placeholder with the actual PRESTO_VERSION variable after loading it by running
a sed command to substitute UNKNOWN with the PRESTO_VERSION in all generated
.properties files. Then add a verification step that searches for any remaining
UNKNOWN placeholders and exits with an error if found. Also, manually verify
that the custom keys register-test-functions and
runtime-metrics-collection-enabled are supported by your Presto build to prevent
silent ignores.

Comment on lines +9 to +11
find /configs -type f | while read -r f; do
( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
done
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Quote $f to avoid issues with filenames containing spaces.

-  ( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
+  ( echo "cat <<EOF"; cat "$f"; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
find /configs -type f | while read -r f; do
( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
done
find /configs -type f | while read -r f; do
( echo "cat <<EOF"; cat "$f"; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
done
🧰 Tools
🪛 Shellcheck (0.10.0)

[info] 10-10: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
In yscope-compose/worker/scripts/generate-configs.sh at lines 9 to 11, the
variable $f is used without quotes, which can cause errors if filenames contain
spaces. To fix this, enclose $f in double quotes wherever it appears in the
script to ensure filenames with spaces are handled correctly.

Comment on lines +69 to +83
INTERNAL_ADDRESS=$(hostname -i)
# Replace `node.internal-address=REPLACE_ME` with actual ip address in the config file
if grep -q '^node.internal-address=REPLACE_ME' "$NODE_PROPERTIES_FILE"; then
sed -i "s|^node.internal-address=REPLACE_ME|node.internal-address=${INTERNAL_ADDRESS}|" "$NODE_PROPERTIES_FILE"
echo "Updated $NODE_PROPERTIES_FILE with node.internal-address ${INTERNAL_ADDRESS}"
else
echo "Warning: 'node.internal-address=REPLACE_ME' not found in $NODE_PROPERTIES_FILE"
exit 1
fi

# Replace `node.id=REPLACE_ME` with actual hostname in the config file
if grep -q '^node.id=REPLACE_ME' "$NODE_PROPERTIES_FILE"; then
sed -i "s|^node.id=REPLACE_ME|node.id=$HOSTNAME|" "$NODE_PROPERTIES_FILE"
echo "Updated $NODE_PROPERTIES_FILE with node.id $HOSTNAME"
else
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

HOSTNAME is not POSIX-portable – use $(hostname) instead.

Some /bin/sh implementations (e.g., BusyBox) do not export HOSTNAME.
Retrieve the value explicitly to avoid empty node.id.

-INTERNAL_ADDRESS=$(hostname -i)
-...
-    sed -i "s|^node.id=REPLACE_ME|node.id=$HOSTNAME|" "$NODE_PROPERTIES_FILE"
+INTERNAL_ADDRESS="$(hostname -i)"
+NODE_ID="$(hostname)"
+...
+    sed -i "s|^node.id=REPLACE_ME|node.id=${NODE_ID}|" "$NODE_PROPERTIES_FILE"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
INTERNAL_ADDRESS=$(hostname -i)
# Replace `node.internal-address=REPLACE_ME` with actual ip address in the config file
if grep -q '^node.internal-address=REPLACE_ME' "$NODE_PROPERTIES_FILE"; then
sed -i "s|^node.internal-address=REPLACE_ME|node.internal-address=${INTERNAL_ADDRESS}|" "$NODE_PROPERTIES_FILE"
echo "Updated $NODE_PROPERTIES_FILE with node.internal-address ${INTERNAL_ADDRESS}"
else
echo "Warning: 'node.internal-address=REPLACE_ME' not found in $NODE_PROPERTIES_FILE"
exit 1
fi
# Replace `node.id=REPLACE_ME` with actual hostname in the config file
if grep -q '^node.id=REPLACE_ME' "$NODE_PROPERTIES_FILE"; then
sed -i "s|^node.id=REPLACE_ME|node.id=$HOSTNAME|" "$NODE_PROPERTIES_FILE"
echo "Updated $NODE_PROPERTIES_FILE with node.id $HOSTNAME"
else
INTERNAL_ADDRESS="$(hostname -i)"
NODE_ID="$(hostname)"
# Replace `node.internal-address=REPLACE_ME` with actual ip address in the config file
if grep -q '^node.internal-address=REPLACE_ME' "$NODE_PROPERTIES_FILE"; then
sed -i "s|^node.internal-address=REPLACE_ME|node.internal-address=${INTERNAL_ADDRESS}|" "$NODE_PROPERTIES_FILE"
echo "Updated $NODE_PROPERTIES_FILE with node.internal-address ${INTERNAL_ADDRESS}"
else
echo "Warning: 'node.internal-address=REPLACE_ME' not found in $NODE_PROPERTIES_FILE"
exit 1
fi
# Replace `node.id=REPLACE_ME` with actual hostname in the config file
if grep -q '^node.id=REPLACE_ME' "$NODE_PROPERTIES_FILE"; then
sed -i "s|^node.id=REPLACE_ME|node.id=${NODE_ID}|" "$NODE_PROPERTIES_FILE"
echo "Updated $NODE_PROPERTIES_FILE with node.id $HOSTNAME"
else
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 81-81: In POSIX sh, HOSTNAME is undefined.

(SC3028)


[warning] 82-82: In POSIX sh, HOSTNAME is undefined.

(SC3028)

🤖 Prompt for AI Agents
In yscope-compose/worker/scripts/generate-configs.sh around lines 69 to 83, the
script uses the variable HOSTNAME which is not guaranteed to be set in all
POSIX-compliant shells. Replace all occurrences of HOSTNAME with the command
substitution $(hostname) to explicitly retrieve the hostname, ensuring
compatibility across different shell environments and avoiding empty node.id
values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants