Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions yscope-compose/.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Coordinator common
PRESTO_COORDINATOR_HTTPPORT="8080"
PRESTO_COORDINATOR_SERVICENAME="presto-coordinator"

# Coordinator clp.properties
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE="mysql"
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL="jdbc:mysql://localhost:6001"
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME="clp-db"
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER="clp-user"
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD="123456"
PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_TABLEPREFIX="clp_"
Comment on lines +9 to +11
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Plain-text credentials committed to VCS.

PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD="123456" exposes a real (or realistic) password.
For public repos consider:

• Replacing with an obvious placeholder
• Loading sensitive values via external .env.local ignored by Git

This helps prevent accidental leaks when the configuration evolves beyond local demos.

🧰 Tools
🪛 dotenv-linter (3.3.0)

[warning] 9-9: [QuoteCharacter] The value has quote characters (', ")


[warning] 9-9: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE key


[warning] 10-10: [QuoteCharacter] The value has quote characters (', ")


[warning] 10-10: [UnorderedKey] The PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD key should go before the PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL key


[warning] 11-11: [QuoteCharacter] The value has quote characters (', ")

🤖 Prompt for AI Agents
In yscope-compose/.env around lines 9 to 11, the database password is stored as
a plain-text credential, which risks exposure in version control. Replace the
actual password value with a placeholder string like "your_password_here" and
configure the application to load the real password from an external .env.local
file that is included in .gitignore to prevent committing sensitive data to the
repository.

PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_SPLITPROVIDER="mysql"

# Coordinator config.properties
PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORY="1GB"
PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORYPERNODE="1GB"

# Coordinator jvm.config
PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE="4G"
PRESTO_COORDINATOR_CONFIG_JVMCONFIG_G1HEAPREGIONSIZE="32M"

# Coordinator log.properties
PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL="DEBUG"

# Coordinator node.properties
PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT="production"

# Worker common
PRESTO_WORKER_HTTPPORT="8080"

# Worker node.properties
PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION="worker-location"

# CLP package archives
CLP_PACKAGE_ARCHIVES=REPLACE_ME
76 changes: 76 additions & 0 deletions yscope-compose/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Setup local docker stack for presto + clp

## Install docker

Follow the guide here: [docker]

Comment on lines +1 to +6
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Multiple H1 headings & missing link reference label

Markdown-lint flags MD025 and an undefined link reference [docker]. Use a single top-level # and move subsequent sections to ##. Also declare the link label before usage or inline.

-# Setup local docker stack for presto + clp
+# Presto + CLP local Docker stack-Follow the guide here: [docker]
+Follow the official [Docker installation guide](https://docs.docker.com/engine/install).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Setup local docker stack for presto + clp
## Install docker
Follow the guide here: [docker]
# Presto + CLP local Docker stack
## Install docker
Follow the official [Docker installation guide](https://docs.docker.com/engine/install).
🤖 Prompt for AI Agents
In yscope-compose/README.md lines 1 to 6, fix the markdown by ensuring only one
top-level heading (#) is used; change subsequent headings to second-level (##).
Also, define the link reference label [docker] before using it or replace it
with an inline link to avoid the undefined link reference error.

# Launch clp-package

1. Find the clp-package for test on our official website [clp-json-v0.4.0]. Here is a sample dataset for demo testing: [postgresql dataset].

2. Untar the clp-package and the postgresql dataset.

3. Replace the content of `/path/to/clp-json-package/etc/clp-config.yml` with the output of `demo-assets/init.sh generate_sample_clp_config`.

4. Launch:

```bash
# You probably want to run a python 3.9 or newer virtual environment
sbin/start-clp.sh
```

5. Compress:

```bash
# You can also use your own dataset
sbin/compress.sh --timestamp-key 'timestamp' /path/to/postgresql.log
```
Comment on lines +17 to +27
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Code fences lack blank lines & language specifiers

MD031/MD040—surround code fences with blank lines and specify the language for syntax highlight:

-```bash
+# ```bash
 docker compose up
-```
+```

Repeat for every fenced block.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

22-22: Ordered list item prefix
Expected: 1; Actual: 5; Style: 1/1/1

(MD029, ol-prefix)

🤖 Prompt for AI Agents
In yscope-compose/README.md around lines 17 to 27, the fenced code blocks lack
blank lines before and after them and do not specify the language for syntax
highlighting. To fix this, add a blank line before and after each fenced code
block and include the language identifier (e.g., bash) immediately after the
opening triple backticks for proper markdown formatting and syntax highlighting.


6. Use the following command to update `.env`:

```bash
demo-assets/init.sh update_metadata_config /path/to/clp-json-package
```

# Create Docker Cluster

Create a local docker stack:

```bash
docker compose up
```

To create a docker stack with more than 1 worker (e.g., 3 workers):
```
docker compose up --scale presto-worker=3
```
Comment on lines +29 to +46
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Ordered-list numbering is off

The list jumps from 4 → 5 → 6 because of manual numbering. Switch to Markdown auto-numbering (1. for every item) to avoid drift.

-5. Compress:
-6. Use the following command to update `.env`:
+1. Compress  
+2. Update `.env`:

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

29-29: Ordered list item prefix
Expected: 1; Actual: 6; Style: 1/1/1

(MD029, ol-prefix)


35-35: Multiple top-level headings in the same document

(MD025, single-title, single-h1)


44-44: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


44-44: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In yscope-compose/README.md between lines 29 and 46, the ordered list numbering
is manually set and causes numbering to jump incorrectly. Change all list item
numbers to "1." to use Markdown's automatic numbering feature, which will ensure
the list numbers correctly without manual updates.


# Use cli:

After all containers are in "Started" states (check by `docker ps`):

```bash
# On your host
docker exec -it compose-presto-coordinator-1 sh

# In presto-coordinator container
/opt/presto-cli --catalog clp --schema default --server localhost:8080
```

Example query:
```sql
SELECT * FROM default LIMIT 1;
```

# Delete docker Cluster
Comment on lines +61 to +65
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Trailing punctuation in heading & duplicate H1

Rename # Use cli:## Using the CLI to remove the colon and keep heading levels unique.

🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

61-61: Fenced code blocks should be surrounded by blank lines

(MD031, blanks-around-fences)


65-65: Multiple top-level headings in the same document

(MD025, single-title, single-h1)

🤖 Prompt for AI Agents
In yscope-compose/README.md around lines 61 to 65, the heading "# Use cli:" has
trailing punctuation and duplicates the H1 level. Rename this heading to "##
Using the CLI" to remove the colon and change it to a second-level heading,
ensuring unique and consistent heading levels.


```bash
docker compose down
```



[clp-json-v0.4.0]: https://github.com/y-scope/clp/releases/tag/v0.4.0
[docker]: https://docs.docker.com/engine/install
[postgresql dataset]: https://zenodo.org/records/10516402

10 changes: 10 additions & 0 deletions yscope-compose/coordinator/config-template/clp.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
connector.name=clp
clp.metadata-provider-type=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_PROVIDERTYPE}
clp.metadata-db-url=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL}
clp.metadata-db-name=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME}
clp.metadata-db-user=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER}
clp.metadata-db-password=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD}
clp.metadata-table-prefix=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_TABLEPREFIX}
clp.split-provider-type=${PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_SPLITPROVIDER}
clp.metadata-filter-config=/opt/presto-server/etc/metadata-filter.json
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Hard-coded path may break inside k8s & Docker images

clp.metadata-filter-config=/opt/presto-server/etc/metadata-filter.json assumes the file is copied to the etc dir by the init script. If that copy step is ever skipped, the connector fails at startup. Consider parameterising the path or validating the file in generate-configs.sh.

🤖 Prompt for AI Agents
In yscope-compose/coordinator/config-template/clp.properties at line 9, the path
to metadata-filter-config is hard-coded, which can cause failures if the file is
not copied as expected in k8s or Docker environments. To fix this, parameterize
the path using an environment variable or configuration placeholder instead of a
fixed absolute path, and update generate-configs.sh to validate the existence of
the file before startup to prevent silent failures.


14 changes: 14 additions & 0 deletions yscope-compose/coordinator/config-template/config.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=${PRESTO_COORDINATOR_HTTPPORT}
query.max-memory=${PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORY}
query.max-memory-per-node=${PRESTO_COORDINATOR_CONFIG_CONFIGPROPERTIES_QUERY_MAXMEMORYPERNODE}
discovery-server.enabled=true
discovery.uri=http://${PRESTO_COORDINATOR_SERVICENAME}:${PRESTO_COORDINATOR_HTTPPORT}
Comment on lines +3 to +7
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Potential port clash when multiple coordinators run on one host

http-server.http.port=${PRESTO_COORDINATOR_HTTPPORT} is reused in discovery.uri. If you spin up several dev clusters on the same machine, they’ll all default to the same port and only the first will start. Adding an optional offset or allowing ${COMPOSE_PROJECT_NAME} in the env file prevents this clash.

🤖 Prompt for AI Agents
In yscope-compose/coordinator/config-template/config.properties lines 3 to 7,
the http-server.http.port and discovery.uri use the same environment variable
PRESTO_COORDINATOR_HTTPPORT, causing port conflicts when multiple coordinators
run on one host. To fix this, modify the port assignment to include an optional
offset or incorporate the ${COMPOSE_PROJECT_NAME} variable in the environment
file to generate unique port values per instance, preventing clashes.

optimizer.optimize-hash-generation=false
regex-library=RE2J
use-alternative-function-signatures=true
inline-sql-functions=false
nested-data-serialization-enabled=false
native-execution-enabled=true

10 changes: 10 additions & 0 deletions yscope-compose/coordinator/config-template/jvm.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
-server
-Xmx${PRESTO_COORDINATOR_CONFIG_JVMCONFIG_MAXHEAPSIZE}
-XX:+UseG1GC
-XX:G1HeapRegionSize=${PRESTO_COORDINATOR_CONFIG_JVMCONFIG_G1HEAPREGIONSIZE}
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true

2 changes: 2 additions & 0 deletions yscope-compose/coordinator/config-template/log.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
com.facebook.presto=${PRESTO_COORDINATOR_CONFIG_LOGPROPERTIES_LEVEL}

Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
}

Comment on lines +1 to +3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Empty config likely to be ignored – add a placeholder or explanatory comment

Presto’s metadata filter expects an explicit JSON structure (e.g., an array of regex filters). Committing an empty object without explanatory context makes it hard for operators to know whether the file is intentionally empty or incomplete.

Consider adding at least a minimal placeholder field or inline comment to clarify intent, or remove the file until a real filter is needed.


🌐 Web query:

Presto Hive connector metadata filter JSON structure example

💡 Result:

A Presto/Trino Hive connector metadata filter JSON structure typically defines which databases, schemas, tables, and columns to include or exclude when ingesting metadata, most often for use with metadata tools like DataHub or OpenMetadata. The filter configuration is usually provided as part of a larger ingestion recipe or JSON configuration object.

Below is an example of such a JSON structure based on common patterns observed in Presto/Trino Hive connector and metadata tool documentation[1][2][6][7]:

{
  "source": {
    "type": "presto",
    "serviceName": "my_presto",
    "host_port": "presto.example.com:8080",
    "database": "hive",
    "filter": {
      "database_pattern": {
        "allow": ["^analytics$", "^sales_.*"],
        "deny": ["^test_.*"]
      },
      "schema_pattern": {
        "allow": ["^public$", "^prod_.*"],
        "deny": ["^temp_.*"]
      },
      "table_pattern": {
        "allow": ["^.*_log$", "^user_data$"],
        "deny": ["^deprecated_.*"]
      },
      "column_pattern": {
        "deny": ["^debug_.*"]
      }
    }
  }
}

Key Fields and Usage:

  • database_pattern: Regex to allow or deny databases from ingestion. For example, only databases called "analytics" or starting with "sales_" are included[1][2].
  • schema_pattern: Controls which schemas are included or excluded.
  • table_pattern: Regexes for allowing or denying tables—e.g., include tables ending with "log" or named "user_data," but exclude those starting with "deprecated"[1].
  • column_pattern: Optionally filter columns (rarely used for Presto/Hive, but some frameworks support it).

Alternate/Additional Fields:

  • tables_where_clause_suffix / schemas_where_clause_suffix: Sometimes, SQL-style WHERE clause fragments are used to filter which tables or schemas are discovered[1].
  • view_pattern: If supported, similar pattern-matching for views.
  • catalog_to_connector_details: In Trino/Presto, advanced setups may map catalogs to detailed connector info, but this is for lineage, not filtering[2][6].

Ingestion tools like DataHub and OpenMetadata may use similar but not identical JSON, sometimes within a YAML configuration, and sometimes with field names such as pattern or filter_pattern[1][7].

Caveats:

  • The above is a generic example derived from DataHub, OpenMetadata, and Presto/Trino connector patterns. The exact field names and structure may vary based on the ingestion tool or connector version in use[1][2][6][7].
  • Presto/Trino do not natively support metadata filtering via JSON to the connector itself—these patterns are mostly for ingestion/configuration tooling, not for the connector’s internal configuration[1][3].

If you are using a specific tool (like DataHub, OpenMetadata, or custom ingestion scripts), consult its documentation for the exact schema required. The structure shown here is widely compatible for such use cases.

Citations:


Add a minimal placeholder to metadata‐filter.json
Empty JSON objects are ignored by Presto’s ingestion tooling, so operators can’t tell if the file is intentional. Please include at least a stubbed filter block (or a _comment field) to clarify intent.

• File: yscope-compose/coordinator/config-template/metadata-filter.json

Suggested diff:

- {}
+ {
+   "filter": {
+     "database_pattern": {
+       "allow": ["^.*$"],    // allow all databases by default
+       "deny": []            // no databases denied
+     }
+   }
+ }
🤖 Prompt for AI Agents
In yscope-compose/coordinator/config-template/metadata-filter.json at lines 1 to
3, the JSON file is currently empty, which causes Presto’s ingestion tooling to
ignore it and leaves operators uncertain if the file is intentional. To fix
this, add a minimal placeholder such as a stubbed "filter" block or a "_comment"
field inside the JSON object to clarify the file's purpose and ensure it is
recognized by the tooling.

3 changes: 3 additions & 0 deletions yscope-compose/coordinator/config-template/node.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
node.id=${PRESTO_COORDINATOR_SERVICENAME}
Comment on lines +1 to +2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Fail-fast if env vars are missing

Both placeholders rely on environment variables that, if unset, will propagate literally into the running coordinator and cause a startup failure that is hard to diagnose. Add default values in the template or make the generate-configs.sh script fail with a clear error when the vars are absent.


🏁 Script executed:

#!/bin/bash
# Locate any generate-configs.sh scripts in the repository
fd generate-configs.sh -t f || exit 0

# For each one found, print its path and the first 50 lines to see how env vars are handled
for script in $(fd generate-configs.sh -t f); do
  echo "---- Inspecting $script ----"
  head -n 50 "$script"
  echo
done

Length of output: 2750


Add explicit checks for missing coordinator env vars

The generate-configs.sh in yscope-compose/coordinator/scripts/ pipes your templates through a fresh sh (with only set –e) so unset variables quietly expand to empty strings rather than failing. To avoid deploying a node.properties with blank values, one of the following should be applied:

• In yscope-compose/coordinator/scripts/generate-configs.sh (before doing any substitutions), add fail-fast checks such as:

: "${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT:?Error: PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT must be set}"
: "${PRESTO_COORDINATOR_SERVICENAME:?Error: PRESTO_COORDINATOR_SERVICENAME must be set}"

• Or declare safe defaults in the template (yscope-compose/coordinator/config-template/node.properties), for example:

node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT:-production}
node.id=${PRESTO_COORDINATOR_SERVICENAME:-coordinator}

Either approach will surface a clear error or provide a sensible default rather than let Presto start with empty values.

🤖 Prompt for AI Agents
In yscope-compose/coordinator/config-template/node.properties at lines 1-2, the
environment variables used for node.environment and node.id can expand to empty
strings if unset, causing invalid configuration. To fix this, either update
yscope-compose/coordinator/scripts/generate-configs.sh to add fail-fast checks
for these variables before substitution, or modify the node.properties template
to provide safe default values using the syntax ${VAR:-default}. This ensures
that missing variables either cause a clear error or fallback to sensible
defaults, preventing deployment of blank values.


19 changes: 19 additions & 0 deletions yscope-compose/coordinator/scripts/$
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/bin/sh

# Exit on error
set -e

PRESTO_CONFIG_DIR="/opt/presto-server/etc"

# Substitute environemnt variables in config template
find /configs -type f | while read -r f; do
( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
done

# Setup the config directory hierarchy
rm -rf ${PRESTO_CONFIG_DIR}/catalog
mkdir -p ${PRESTO_CONFIG_DIR}/catalog

# Copy over files
mv ${PRESTO_CONFIG_DIR}/clp.properties ${PRESTO_CONFIG_DIR}/catalog

18 changes: 18 additions & 0 deletions yscope-compose/coordinator/scripts/generate-configs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#!/bin/sh

# Exit on error
set -e

PRESTO_CONFIG_DIR="/opt/presto-server/etc"

# Substitute environemnt variables in config template
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Fix typo in comment.

The comment has a spelling error: "environemnt" should be "environment".

-# Substitute environemnt variables in config template
+# Substitute environment variables in config template
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Substitute environemnt variables in config template
# Substitute environment variables in config template
🤖 Prompt for AI Agents
In yscope-compose/coordinator/scripts/generate-configs.sh at line 8, correct the
typo in the comment by changing "environemnt" to "environment" to fix the
spelling error.

find /configs -type f | while read -r f; do
( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add quotes around variable to prevent word splitting.

The variable should be quoted to prevent potential issues with filenames containing spaces or special characters.

-  ( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
+  ( echo "cat <<EOF"; cat "$f"; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
( echo "cat <<EOF"; cat $f; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
( echo "cat <<EOF"; cat "$f"; echo "EOF" ) | sh > "${PRESTO_CONFIG_DIR}/$(basename "$f")"
🧰 Tools
🪛 Shellcheck (0.10.0)

[info] 10-10: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
In yscope-compose/coordinator/scripts/generate-configs.sh at line 10, the
variable $f is used without quotes, which can cause word splitting or globbing
issues if the filename contains spaces or special characters. To fix this, add
double quotes around $f wherever it appears in the command to ensure the
filename is treated as a single argument.

done

# Setup the config directory hierarchy
rm -f ${PRESTO_CONFIG_DIR}/catalog/*
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider consistency with catalog cleanup approach.

This version uses rm -f for catalog cleanup, while there's another version of this script that uses rm -rf. Consider standardising the approach across both versions for consistency.

🤖 Prompt for AI Agents
In yscope-compose/coordinator/scripts/generate-configs.sh at line 14, the script
uses 'rm -f' to clean up the catalog directory, but another version uses 'rm
-rf'. To maintain consistency, update this line to use 'rm -rf' instead of 'rm
-f' so that the cleanup approach matches across both script versions.


# Copy over files
mv ${PRESTO_CONFIG_DIR}/clp.properties ${PRESTO_CONFIG_DIR}/catalog

39 changes: 39 additions & 0 deletions yscope-compose/demo-assets/clp-config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
package:
storage_engine: "clp-s"
database:
type: "mariadb"
host: "${REPLACE_IP}"
port: 6001
name: "clp-db"
query_scheduler:
host: "${REPLACE_IP}"
port: 6002
jobs_poll_delay: 0.1
num_archives_to_search_per_sub_job: 16
logging_level: "INFO"
queue:
host: "${REPLACE_IP}"
port: 6003
redis:
host: "${REPLACE_IP}"
port: 6004
query_backend_database: 0
compression_backend_database: 1
reducer:
host: "${REPLACE_IP}"
base_port: 6100
logging_level: "INFO"
upsert_interval: 100
results_cache:
host: "${REPLACE_IP}"
port: 6005
db_name: "clp-query-results"
stream_collection_name: "stream-files"
webui:
host: "localhost"
port: 6000
logging_level: "INFO"
log_viewer_webui:
host: "localhost"
port: 6006
Comment on lines +5 to +38
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Use a single source for the service host

Almost every section repeats host: "${REPLACE_IP}". Consider introducing a YAML anchor/alias or env-var (${CLP_HOST}) to avoid copy-paste errors:

defaults: &host
  host: "${CLP_HOST}"

database:
  <<: *host
  port: 6001
  
queue:
  <<: *host
  port: 6003
🤖 Prompt for AI Agents
In yscope-compose/demo-assets/clp-config.yml between lines 5 and 38, the host
value "${REPLACE_IP}" is repeated multiple times across different service
sections. To avoid duplication and potential copy-paste errors, define a YAML
anchor or use a single environment variable like "${CLP_HOST}" once, then
reference it in each service section using YAML anchors and aliases. This will
centralize the host configuration and make the file easier to maintain.


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Remove extra blank line at EOF

yamllint reports an extra blank line. Keeping the file tidy prevents noisy diffs.

🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 39-39: too many blank lines (1 > 0)

(empty-lines)

🤖 Prompt for AI Agents
In yscope-compose/demo-assets/clp-config.yml at line 39, there is an extra blank
line at the end of the file. Remove this trailing blank line to comply with
yamllint rules and keep the file tidy, preventing unnecessary diffs in future
commits.

50 changes: 50 additions & 0 deletions yscope-compose/demo-assets/init.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/usr/bin/env bash

SCRIPT_PATH="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

function generate_sample_clp_config {
local ip=$(hostname -i)
local file="${SCRIPT_PATH}/clp-config.yml"
cp "$file" "${file}.bak"
sed -i "s|\${REPLACE_IP}|$ip|g" "$file"
echo "Replaced \${REPLACE_IP} with $ip in $file"
}
Comment on lines +5 to +11
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

generate_sample_clp_config is copying the wrong way around.

The intention (per the docs) is to copy clp-config.yml.bakclp-config.yml, then patch the placeholder.
The current code instead backs up an already-patched file, so the very first run destroys the pristine template.

-function generate_sample_clp_config {
-    local ip=$(hostname -i)
-    local file="${SCRIPT_PATH}/clp-config.yml"
-    cp "$file" "${file}.bak"
-    sed -i "s|\${REPLACE_IP}|$ip|g" "$file"
+function generate_sample_clp_config {
+    local ip
+    ip="$(hostname -i)"
+
+    local template="${SCRIPT_PATH}/clp-config.yml.bak"
+    local file="${SCRIPT_PATH}/clp-config.yml"
+
+    if [[ ! -f "$template" ]]; then
+        echo "Error: $template not found"; return 1
+    fi
+
+    cp "$template" "$file"
+    sed -i "s|\${REPLACE_IP}|$ip|g" "$file"
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 6-6: Declare and assign separately to avoid masking return values.

(SC2155)

🤖 Prompt for AI Agents
In yscope-compose/demo-assets/init.sh around lines 5 to 11, the
generate_sample_clp_config function copies clp-config.yml to clp-config.yml.bak,
which overwrites the original template on the first run. To fix this, reverse
the copy direction to copy clp-config.yml.bak to clp-config.yml before applying
the sed replacement, preserving the pristine template as the source.


function update_metadata_config {
if [[ $# -ne 1 ]]; then
echo "Usage: update_metadata_config </path/to/clp-package>"
return 1
fi

local clp_pkg_home=$1
local clp_config_path="$(readlink -f ${clp_pkg_home})/etc/clp-config.yml"
local credential_path="$(readlink -f ${clp_pkg_home})/etc/credentials.yml"
Comment on lines +20 to +21
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Quote shell variables to avoid word-splitting and globbing.

readlink -f ${clp_pkg_home} and the last sed command omit quotes.
Wrap them in double quotes to handle paths containing spaces.

-    local clp_config_path="$(readlink -f ${clp_pkg_home})/etc/clp-config.yml"
-    ...
-    sed -i "s|^CLP_PACKAGE_ARCHIVES=.*|CLP_PACKAGE_ARCHIVES=\"$(readlink -f ${clp_pkg_home})/var/data/archives/default\"|" "$env_path"
+    local clp_config_path="$(readlink -f "${clp_pkg_home}")/etc/clp-config.yml"
+    ...
+    sed -i "s|^CLP_PACKAGE_ARCHIVES=.*|CLP_PACKAGE_ARCHIVES=\"$(readlink -f "${clp_pkg_home}")/var/data/archives/default\"|" "$env_path"

Also applies to: 39-39

🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 20-20: Declare and assign separately to avoid masking return values.

(SC2155)


[info] 20-20: Double quote to prevent globbing and word splitting.

(SC2086)


[warning] 21-21: Declare and assign separately to avoid masking return values.

(SC2155)


[info] 21-21: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
In yscope-compose/demo-assets/init.sh at lines 20-21 and line 39, the shell
variables used inside readlink and sed commands are not quoted, which can cause
word-splitting and globbing issues if paths contain spaces. Fix this by wrapping
all variable expansions like ${clp_pkg_home} in double quotes within these
commands to ensure paths with spaces are handled correctly.

host=$(python3 -c 'import sys, yaml; print(yaml.load(sys.stdin)["database"]["host"])' < "$clp_config_path")
port=$(python3 -c 'import sys, yaml; print(yaml.load(sys.stdin)["database"]["port"])' < "$clp_config_path")
name=$(python3 -c 'import sys, yaml; print(yaml.load(sys.stdin)["database"]["name"])' < "$clp_config_path")
user=$(python3 -c 'import sys, yaml; print(yaml.load(sys.stdin)["database"]["user"])' < "$credential_path")
password=$(python3 -c 'import sys, yaml; print(yaml.load(sys.stdin)["database"]["password"])' < "$credential_path")
Comment on lines +22 to +26
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Unsafe YAML parsing – switch to safe_load.

yaml.load() with the default loader can execute arbitrary Python objects.
Even if the files are local, this is an avoidable risk:

-import sys, yaml; print(yaml.load(sys.stdin)["database"]["host"])
+import sys, yaml; print(yaml.safe_load(sys.stdin.read())["database"]["host"])

Apply the same change to the other four inline Python invocations.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
host=$(python3 -c 'import sys, yaml; print(yaml.load(sys.stdin)["database"]["host"])' < "$clp_config_path")
port=$(python3 -c 'import sys, yaml; print(yaml.load(sys.stdin)["database"]["port"])' < "$clp_config_path")
name=$(python3 -c 'import sys, yaml; print(yaml.load(sys.stdin)["database"]["name"])' < "$clp_config_path")
user=$(python3 -c 'import sys, yaml; print(yaml.load(sys.stdin)["database"]["user"])' < "$credential_path")
password=$(python3 -c 'import sys, yaml; print(yaml.load(sys.stdin)["database"]["password"])' < "$credential_path")
host=$(python3 -c 'import sys, yaml; print(yaml.safe_load(sys.stdin.read())["database"]["host"])' < "$clp_config_path")
port=$(python3 -c 'import sys, yaml; print(yaml.safe_load(sys.stdin.read())["database"]["port"])' < "$clp_config_path")
name=$(python3 -c 'import sys, yaml; print(yaml.safe_load(sys.stdin.read())["database"]["name"])' < "$clp_config_path")
user=$(python3 -c 'import sys, yaml; print(yaml.safe_load(sys.stdin.read())["database"]["user"])' < "$credential_path")
password=$(python3 -c 'import sys, yaml; print(yaml.safe_load(sys.stdin.read())["database"]["password"])' < "$credential_path")
🤖 Prompt for AI Agents
In yscope-compose/demo-assets/init.sh around lines 22 to 26, the use of
yaml.load() without specifying a safe loader poses a security risk. Update all
inline Python commands to use yaml.safe_load() instead of yaml.load() to ensure
safe parsing of YAML files and prevent execution of arbitrary Python objects.

echo "Metadata database host: $host"
echo "Metadata database port: $port"
echo "Metadata database name: $name"
echo "Metadata database user: $user"
echo "Metadata database password: $password"

local env_path="${SCRIPT_PATH}/../.env"
cp "$env_path" "${env_path}.bak"
sed -i "s|^PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL=.*|PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_URL=\"jdbc:mysql://${host}:${port}\"|" "$env_path"
sed -i "s/^PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME=.*/PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_NAME=\"${name}\"/" "$env_path"
sed -i "s/^PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER=.*/PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_USER=\"${user}\"/" "$env_path"
sed -i "s/^PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD=.*/PRESTO_COORDINATOR_CONFIG_CLPPROPERTIES_METADATA_DATABASE_PASSWORD=\"${password}\"/" "$env_path"
sed -i "s|^CLP_PACKAGE_ARCHIVES=.*|CLP_PACKAGE_ARCHIVES=\"$(readlink -f ${clp_pkg_home})/var/data/archives/default\"|" "$env_path"
}

if declare -f "$1" > /dev/null; then
"$@"
else
echo "Error: '$1' is not a valid function name."
echo "Available functions:"
declare -F | awk '{print $3}'
exit 1
fi

39 changes: 39 additions & 0 deletions yscope-compose/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
version: "3.9"

services:
presto-coordinator:
image: ghcr.io/y-scope/presto/coordinator:dev
entrypoint: ["/bin/bash", "-c", "/scripts/generate-configs.sh && /opt/entrypoint.sh"]
env_file:
Comment on lines +5 to +7
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Add explicit restart: unless-stopped for resilience

Without a restart policy, the services stay down after a container crash or host reboot, which hurts dev UX.

🤖 Prompt for AI Agents
In yscope-compose/docker-compose.yaml around lines 5 to 7, add the line
"restart: unless-stopped" under the service definition to ensure the container
automatically restarts after crashes or host reboots, improving resilience and
developer experience.

- .env
volumes:
- ./coordinator/scripts:/scripts:ro
- coordinator-config:/opt/presto-server/etc
- ./coordinator/config-template:/configs:ro
networks:
- presto

presto-worker:
image: ghcr.io/y-scope/presto/prestissimo-worker:dev
depends_on:
presto-coordinator:
condition: service_started
entrypoint: ["/bin/bash", "-c", "/scripts/generate-configs.sh && /opt/entrypoint.sh"]
env_file:
- .env
volumes:
- ./worker/scripts:/scripts:ro
- worker-config:/opt/presto-server/etc
- ./worker/config-template:/configs:ro
- "${CLP_PACKAGE_ARCHIVES}:${CLP_PACKAGE_ARCHIVES}"
networks:
Comment on lines +24 to +29
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Mounting host path via env var can fail on Windows

The volume - "${CLP_PACKAGE_ARCHIVES}:${CLP_PACKAGE_ARCHIVES}" relies on an absolute host path being exported. On Windows or when the var is unset, Compose will create a named volume instead, and the worker won’t see the archives. Add a .env.example with a commented sample path and document OS-specific quoting.

🤖 Prompt for AI Agents
In yscope-compose/docker-compose.yaml around lines 24 to 29, the volume mount
using the environment variable CLP_PACKAGE_ARCHIVES can fail on Windows or if
the variable is unset, causing Docker Compose to create a named volume instead
of mounting the intended host path. To fix this, create a .env.example file with
a commented example of the CLP_PACKAGE_ARCHIVES variable set to an absolute
path, and add documentation explaining how to properly quote or format the path
on different operating systems, especially Windows, to ensure the volume mounts
correctly.

- presto

volumes:
coordinator-config:
worker-config:

networks:
presto:
driver: bridge
Comment on lines +32 to +38
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

One blank line too many

yamllint marks the extra blank line after the network block. Remove it to keep CI green.

-    driver: bridge
-
+    driver: bridge
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
volumes:
coordinator-config:
worker-config:
networks:
presto:
driver: bridge
volumes:
coordinator-config:
worker-config:
networks:
presto:
driver: bridge
🤖 Prompt for AI Agents
In yscope-compose/docker-compose.yaml around lines 32 to 38, there is an extra
blank line after the networks block that causes yamllint to fail. Remove the
blank line immediately following the networks section to comply with yamllint
rules and keep the CI green.


2 changes: 2 additions & 0 deletions yscope-compose/worker/config-template/clp.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
connector.name=clp

7 changes: 7 additions & 0 deletions yscope-compose/worker/config-template/config.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
discovery.uri=http://${PRESTO_COORDINATOR_SERVICENAME}:${PRESTO_COORDINATOR_HTTPPORT}
presto.version=REPLACE_ME
http-server.http.port=${PRESTO_WORKER_HTTPPORT}
shutdown-onset-sec=1
register-test-functions=false
runtime-metrics-collection-enabled=false
Comment on lines +1 to +6
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Automate replacement of REPLACE_ME & validate custom keys

presto.version=REPLACE_ME will break the worker if the update step in generate-configs.sh ever fails. Recommend:

  1. Change the placeholder to an obviously invalid value (e.g., UNKNOWN) and have the script exit non-zero if it still finds that value after substitution.
  2. Verify that non-stock keys like register-test-functions and runtime-metrics-collection-enabled are recognised by the intended Presto build; otherwise they will be ignored silently.

Example diff for (1):

-presto.version=REPLACE_ME
+presto.version=UNKNOWN   # replaced by generate-configs.sh

I’ll verify how generate-configs.sh handles the REPLACE_ME placeholder and look for any validation logic:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate the generate-configs.sh script
SCRIPT=$(fd generate-configs.sh | head -n1)
echo "Inspecting: $SCRIPT"

# Check for REPLACE_ME usage
grep -n -C 3 'REPLACE_ME' "$SCRIPT" || echo "⚠️ No occurrence of REPLACE_ME found"

# Check where presto.version is set or validated
grep -n -C 3 'presto.version' "$SCRIPT" || echo "⚠️ No direct handling of presto.version in this script"

Length of output: 632


Automate presto.version substitution and validate custom keys

The generate-configs.sh script currently does not handle the REPLACE_ME placeholder (or presto.version) and will leave it untouched—causing the worker to fail at runtime. To address this:

• In yscope-compose/worker/config-template/config.properties

-presto.version=REPLACE_ME
+presto.version=UNKNOWN   # replaced by generate-configs.sh

• In yscope-compose/coordinator/scripts/generate-configs.sh

  1. After loading $PRESTO_VERSION, replace “UNKNOWN” in generated configs:
    find "$OUTPUT_DIR" -type f -name '*.properties' \
      -exec sed -i "s/UNKNOWN/${PRESTO_VERSION}/g" {} +
  2. Then verify no placeholder remains:
    if grep -R -q 'UNKNOWN' "$OUTPUT_DIR"; then
      echo "Error: presto.version was not populated" >&2
      exit 1
    fi

• Manually confirm that custom keys
register-test-functions
runtime-metrics-collection-enabled
are supported by your Presto build (see Presto’s documentation or code) to avoid silent ignores.

🤖 Prompt for AI Agents
In yscope-compose/worker/config-template/config.properties lines 1 to 6, the
presto.version key is set to the placeholder REPLACE_ME which is not
automatically replaced, causing runtime failures. To fix this, update the
generate-configs.sh script in yscope-compose/coordinator/scripts to replace the
placeholder with the actual PRESTO_VERSION variable after loading it by running
a sed command to substitute UNKNOWN with the PRESTO_VERSION in all generated
.properties files. Then add a verification step that searches for any remaining
UNKNOWN placeholders and exits with an error if found. Also, manually verify
that the custom keys register-test-functions and
runtime-metrics-collection-enabled are supported by your Presto build to prevent
silent ignores.


5 changes: 5 additions & 0 deletions yscope-compose/worker/config-template/node.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
node.environment=${PRESTO_COORDINATOR_CONFIG_NODEPROPERTIES_ENVIRONMENT}
node.internal-address=REPLACE_ME
node.location=${PRESTO_WORKER_CONFIG_NODEPROPERTIES_LOCATION}
node.id=REPLACE_ME

2 changes: 2 additions & 0 deletions yscope-compose/worker/config-template/velox.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
mutable-config=true

Loading
Loading