-
Notifications
You must be signed in to change notification settings - Fork 706
Sync db catalogs. #3138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync db catalogs. #3138
Conversation
|
Claude finished @khustup2's task in 1m 44s —— View job Code Review Complete
Issues FoundFound 7 significant issues requiring attention: 🔴 Critical: Missing Files (Build Failure)
🔴 Critical: SQL Injection Vulnerabilities (Security)
C++ Changes Look Good ✓The C++ changes for catalog synchronization are well-structured:
|
| COPY --chmod=444 ./LICENSE /LICENSE | ||
| COPY ./postgres/docker-entrypoint.d/ /docker-entrypoint-initdb.d/ | ||
| RUN apt-get install --no-install-recommends -y /tmp/debs/pg-deeplake-${VERSION}_${TARGETARCH}.deb && rm -rf /tmp/debs/ | ||
| COPY ./serverless/scripts/init-deeplake-stateless.sh /tmp/init-deeplake-stateless.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing source files: These COPY commands reference files in ./serverless/ directory that don't exist in this PR or repository, causing build failures when STATELESS=true.
Fix: Either add the missing files to this PR or remove the stateless Docker build logic until the serverless infrastructure is ready.
|
|
||
| def drop_table(conn, table_name): | ||
| with conn.cursor() as cur: | ||
| cur.execute(f"DROP TABLE IF EXISTS {table_name} CASCADE;") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection risk: Using f-string with user-controlled table_name parameter allows SQL injection if table names come from untrusted sources.
Fix: Use sql.Identifier from psycopg2 to safely quote table names:
| cur.execute(f"DROP TABLE IF EXISTS {table_name} CASCADE;") | |
| cur.execute(sql.SQL("DROP TABLE IF EXISTS {} CASCADE;").format(sql.Identifier(table_name))) |
| def create_table(conn, table_name, table_def): | ||
| columns = table_def['columns'] | ||
| col_defs = ', '.join([f"{name} {dtype}" for name, dtype, *_ in columns]) | ||
| sql = f"CREATE TABLE {table_name} ({col_defs}) USING deeplake;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection risk: Same issue - table_name should be properly quoted as an identifier.
Fix: Use sql.Identifier:
| sql = f"CREATE TABLE {table_name} ({col_defs}) USING deeplake;" | |
| sql_query = sql.SQL("CREATE TABLE {} ({}) USING deeplake;").format( | |
| sql.Identifier(table_name), | |
| sql.SQL(col_defs) | |
| ) | |
| with conn.cursor() as cur: | |
| cur.execute(sql_query) |
| if lines_in_chunk >= CHUNK_SIZE: | ||
| break | ||
|
|
||
| if lines_in_chunk == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection risk: table_name in COPY command needs proper identifier quoting.
Fix: Use sql.Identifier:
| if lines_in_chunk == 0: | |
| cur.copy_expert( | |
| sql.SQL("COPY {} FROM STDIN WITH (FORMAT csv, DELIMITER '|')").format( | |
| sql.Identifier(table_name) | |
| ), | |
| chunk | |
| ) |
| log(f" chunk {chunk_num}: {lines_in_chunk:,} rows committed ({total_lines:,} total)") | ||
|
|
||
| return total_lines | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection risk: Same identifier quoting issue.
Fix: Use sql.Identifier:
| cur.execute(sql.SQL("SELECT COUNT(*) FROM {};").format(sql.Identifier(table_name))) |
|
|
||
|
|
||
| def run_vacuum(conn, table_name): | ||
| old_autocommit = conn.autocommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL injection risk: VACUUM command also needs proper identifier quoting.
Fix: Use sql.Identifier:
| old_autocommit = conn.autocommit | |
| cur.execute(sql.SQL("VACUUM ANALYZE {};").format(sql.Identifier(table_name))) |
| @@ -0,0 +1,385 @@ | |||
| #!/usr/bin/env python3 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing import for SQL injection fix: Add from psycopg2 import sql at the top to support safe identifier quoting in SQL queries below.
a2be413 to
56df3d2
Compare
|



🚀 🚀 Pull Request
Impact
Description
Things to be aware of
Things to worry about
Additional Context