Skip to content

Conversation

@khustup2
Copy link
Contributor

🚀 🚀 Pull Request

Impact

  • Bug fix (non-breaking change which fixes expected existing functionality)
  • Enhancement/New feature (adds functionality without impacting existing logic)
  • Breaking change (fix or feature that would cause existing functionality to change)

Description

Things to be aware of

Things to worry about

Additional Context

@claude
Copy link

claude bot commented Feb 12, 2026

Claude finished @khustup2's task in 1m 44s —— View job


Code Review Complete

  • Read repository guidelines (CLAUDE.md)
  • Examine PR diff and changes
  • Review C++ PostgreSQL extension changes
  • Review new Python script
  • Review workflow and Dockerfile changes
  • Post inline comments for issues found
  • Provide summary

Issues Found

Found 7 significant issues requiring attention:

🔴 Critical: Missing Files (Build Failure)

  • postgres/Dockerfile:32 - References ./serverless/ files that don't exist, will fail when STATELESS=true

🔴 Critical: SQL Injection Vulnerabilities (Security)

  • scripts/tpch_deeplake_ingest.py - 6 instances of unsafe SQL query construction using f-strings with table names
    • Lines: 1 (missing import), 157, 164, 211, 228, 237
    • All table name interpolations need psycopg2.sql.Identifier() for safe quoting

C++ Changes Look Good ✓

The C++ changes for catalog synchronization are well-structured:

  • Proper empty string checks before catalog operations
  • Redis cache initialization for distributed environments
  • IsUnderPostmaster guard prevents initialization issues
  • Static local removal in get_deeplake_root_directory() fixes longjmp poisoning

COPY --chmod=444 ./LICENSE /LICENSE
COPY ./postgres/docker-entrypoint.d/ /docker-entrypoint-initdb.d/
RUN apt-get install --no-install-recommends -y /tmp/debs/pg-deeplake-${VERSION}_${TARGETARCH}.deb && rm -rf /tmp/debs/
COPY ./serverless/scripts/init-deeplake-stateless.sh /tmp/init-deeplake-stateless.sh
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing source files: These COPY commands reference files in ./serverless/ directory that don't exist in this PR or repository, causing build failures when STATELESS=true.

Fix: Either add the missing files to this PR or remove the stateless Docker build logic until the serverless infrastructure is ready.


def drop_table(conn, table_name):
with conn.cursor() as cur:
cur.execute(f"DROP TABLE IF EXISTS {table_name} CASCADE;")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection risk: Using f-string with user-controlled table_name parameter allows SQL injection if table names come from untrusted sources.

Fix: Use sql.Identifier from psycopg2 to safely quote table names:

Suggested change
cur.execute(f"DROP TABLE IF EXISTS {table_name} CASCADE;")
cur.execute(sql.SQL("DROP TABLE IF EXISTS {} CASCADE;").format(sql.Identifier(table_name)))

def create_table(conn, table_name, table_def):
columns = table_def['columns']
col_defs = ', '.join([f"{name} {dtype}" for name, dtype, *_ in columns])
sql = f"CREATE TABLE {table_name} ({col_defs}) USING deeplake;"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection risk: Same issue - table_name should be properly quoted as an identifier.

Fix: Use sql.Identifier:

Suggested change
sql = f"CREATE TABLE {table_name} ({col_defs}) USING deeplake;"
sql_query = sql.SQL("CREATE TABLE {} ({}) USING deeplake;").format(
sql.Identifier(table_name),
sql.SQL(col_defs)
)
with conn.cursor() as cur:
cur.execute(sql_query)

if lines_in_chunk >= CHUNK_SIZE:
break

if lines_in_chunk == 0:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection risk: table_name in COPY command needs proper identifier quoting.

Fix: Use sql.Identifier:

Suggested change
if lines_in_chunk == 0:
cur.copy_expert(
sql.SQL("COPY {} FROM STDIN WITH (FORMAT csv, DELIMITER '|')").format(
sql.Identifier(table_name)
),
chunk
)

log(f" chunk {chunk_num}: {lines_in_chunk:,} rows committed ({total_lines:,} total)")

return total_lines

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection risk: Same identifier quoting issue.

Fix: Use sql.Identifier:

Suggested change
cur.execute(sql.SQL("SELECT COUNT(*) FROM {};").format(sql.Identifier(table_name)))



def run_vacuum(conn, table_name):
old_autocommit = conn.autocommit
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection risk: VACUUM command also needs proper identifier quoting.

Fix: Use sql.Identifier:

Suggested change
old_autocommit = conn.autocommit
cur.execute(sql.SQL("VACUUM ANALYZE {};").format(sql.Identifier(table_name)))

@@ -0,0 +1,385 @@
#!/usr/bin/env python3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing import for SQL injection fix: Add from psycopg2 import sql at the top to support safe identifier quoting in SQL queries below.

@sonarqubecloud
Copy link

@khustup2 khustup2 merged commit d9224c3 into main Feb 13, 2026
6 checks passed
@khustup2 khustup2 deleted the db-catalog-sync branch February 13, 2026 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant