Skip to content

Conversation

@ydayagi
Copy link
Collaborator

@ydayagi ydayagi commented Dec 4, 2025

Jira Ticket

FLPATH-2822

Description

Replacing Trino+Hive+Parquet with Postgres for on-prem deployment

Testing

  1. Checkout Branch
  2. Restart Koku
  3. Hit endpoint or launch shell
    1. You should see ...
  4. Do more things...

Release Notes

  • proposed release note
* [COST-####](https://issues.redhat.com/browse/COST-####) Fix some things

@gemini-code-assist
Copy link

Summary of Changes

Hello @ydayagi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the data access and processing mechanisms to support PostgreSQL as a primary data store for on-premise deployments. It introduces a flexible database accessor pattern, allowing the system to seamlessly adapt to either Trino or PostgreSQL environments. The changes primarily focus on abstracting database interactions, updating data ingestion pipelines for Parquet files, and providing conditional feature flag management, all aimed at enhancing deployment flexibility and maintainability for different environments.

Highlights

  • Database Abstraction Layer: Introduced an abstract ReportDBAccessor and concrete implementations for Trino and PostgreSQL, allowing the application to dynamically switch between database backends based on the ONPREM setting.
  • On-Premise PostgreSQL Support: Enabled PostgreSQL as the data store for on-premise deployments, replacing the Trino+Hive+Parquet stack. This includes specific SQL generation for schema, table, and partition management in PostgreSQL.
  • Conditional Unleash Client: Implemented a MockUnleashClient and conditional initialization of the Unleash feature flag client based on a new DISABLE_UNLEASH setting, allowing Unleash to be optionally disabled.
  • Parquet Processing Updates: Modified the Parquet report processing logic across various cloud providers (AWS, Azure, GCP, OCP) to accommodate the new database abstraction, including conditional data handling and table/partition creation for PostgreSQL.
  • Dependency Updates: Updated Pipfile to pin boto3 to a specific version, constrain unleashclient to <6.0.0, and add sqlalchemy as a new dependency.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant architectural change by abstracting database access to support both Trino and PostgreSQL, enabling on-premise deployments to use Postgres instead of a Trino/Hive/Parquet stack. The changes are well-structured, using an abstract base class and factory pattern to switch between database backends based on the ONPREM setting.

My review focuses on the correctness and security of the new implementation. I've identified a few critical issues:

  • A potential SQL injection vulnerability in the PostgreSQL accessor.
  • A bug in the Trino accessor that would cause SQL syntax errors.
  • An incomplete (stubbed) implementation for data deletion in Postgres, which could lead to data inconsistencies.
  • A date-handling bug in partition creation logic that would fail at the end of the year.

I've provided detailed comments and suggestions for each of these points. Addressing them will be crucial for the stability and security of the new on-premise data architecture.

Comment on lines 57 to 62
def get_schema_check_sql(self, schema_name: str):
"""Return the SQL to check if a schema exists."""
return f"SELECT 1 FROM information_schema.schemata WHERE schema_name = '{schema_name}'"

def get_table_check_sql(self, table_name: str, schema_name: str):
return f"SELECT 1 FROM information_schema.tables WHERE table_name = '{table_name}' AND table_schema = '{schema_name}'"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

These methods construct SQL queries using f-strings with values that could potentially be controlled by external factors, which is a SQL injection risk. While schema_name and table_name might currently come from trusted sources, it is a security best practice to always use parameterized queries.

A proper fix would involve a broader refactoring:

  1. The ReportDBAccessor interface (e.g., get_schema_check_sql) should be changed to return a tuple of (sql_template, parameters).
  2. The calling method (_execute_trino_sql in report_parquet_processor_base.py) should be updated to accept these parameters and pass them to cursor.execute().

This would make the database interaction safer and more robust against injection attacks.

Comment on lines 237 to 288
partition_values_lower = [f"'{self._provider_uuid}'", f"'{self._year}'", f"'{self._month}'"]
partition_values_upper = [f"'{self._provider_uuid}'", f"'{self._year}'", f"'{int(self._month)+1:02d}'"]
sql = get_report_db_accessor().get_partition_create_sql(self._schema_name, self._table_name, self._partition_name, partition_values_lower, partition_values_upper)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The calculation for partition_values_upper does not correctly handle the year-end rollover. When self._month is '12', int(self._month)+1 will become 13, which is an invalid month and will cause an error when creating partitions for December data.

You should use date arithmetic, for example with dateutil.relativedelta, to correctly calculate the start of the next month. You will also need to ensure datetime is imported in this file (import datetime).

Suggested change
partition_values_lower = [f"'{self._provider_uuid}'", f"'{self._year}'", f"'{self._month}'"]
partition_values_upper = [f"'{self._provider_uuid}'", f"'{self._year}'", f"'{int(self._month)+1:02d}'"]
sql = get_report_db_accessor().get_partition_create_sql(self._schema_name, self._table_name, self._partition_name, partition_values_lower, partition_values_upper)
start_date = datetime.date(int(self._year), int(self._month), 1)
next_month_date = start_date + relativedelta(months=1)
partition_values_lower = [f"'{self._provider_uuid}'", f"'{start_date.year}'", f"'{start_date.strftime('%m')}'"]
partition_values_upper = [f"'{self._provider_uuid}'", f"'{next_month_date.year}'", f"'{next_month_date.strftime('%m')}'"]
sql = get_report_db_accessor().get_partition_create_sql(self._schema_name, self._table_name, self._partition_name, partition_values_lower, partition_values_upper)

@ydayagi ydayagi force-pushed the trino2postgres branch 2 times, most recently from d897f90 to 376790c Compare December 7, 2025 06:32
@ydayagi ydayagi changed the title trino2postgres: connections, tables, schemas, partitions trino2postgres Dec 15, 2025
@ydayagi ydayagi force-pushed the trino2postgres branch 4 times, most recently from 4755d08 to 61820b7 Compare December 23, 2025 13:35
@ydayagi ydayagi marked this pull request as ready for review January 1, 2026 11:25
@ydayagi ydayagi requested review from a team as code owners January 1, 2026 11:25
@masayag
Copy link

masayag commented Jan 5, 2026

@ydayagi
When ONPREM=true, the application switches to PostgreSQL queries but still creates the hive database through Django migrations. This suggests the hive infrastructure may be unnecessarily provisioned in on-premise deployments where it's not actively used. Can you confirm hive DB isn't necessary for the ONPREM and if so, skip its creation and running migrations for it?

Copy link
Contributor

@lcouzens lcouzens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the initial plan for on-prem was OCP data only to start with. Is there a need to update all these cloud provider parts too (AWS, Azure, GCP)?

I'm concerned we are trying to change too much in one go rather than incremental steps. I'd rather see OCP work exclusively first before we try anything beyond that. It would also help during the review process too since it would be more focused.

In addition to that were you able to run the full IQE integration tests on this locally and get passes? That's the first step I suggest we take, get the tests running and see what results we can get compared to main.

@@ -0,0 +1,14 @@
SELECT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have no plans today to do subs integration on prem, so there shouldnt be any need to add postgres versions of these files. (Everything under subs can be skipped)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was easier to convert everything than to pick the 100% necessary parts. i think that the tests and CI do not have a clear distinction for only ocp.

@masayag
Copy link

masayag commented Jan 6, 2026

@ydayagi testing the image with the on-prem chart raises an error in the cost-onprem-celery-worker-ocp deployment:

[2026-01-06 11:24:55,954] INFO 68869c80-7a53-4593-a89f-f71afb519d08 68869c80-7a53-4593-a89f-f71afb519d08 d198b16e-f22a-4520-9b87-5e5a2dc44160 67 setting summary ManifestState.FAILED for manifest: 3
[2026-01-06 11:24:55,963] ERROR 68869c80-7a53-4593-a89f-f71afb519d08 68869c80-7a53-4593-a89f-f71afb519d08 d198b16e-f22a-4520-9b87-5e5a2dc44160 67 Task failed: syntax error at or near "LIKE"
LINE 1: SHOW TABLES LIKE 'openshift_storage_usage_line_items_daily'
                    ^
Traceback (most recent call last):
  File "/opt/koku/.venv/lib/python3.11/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
           ^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.errors.SyntaxError: syntax error at or near "LIKE"
LINE 1: SHOW TABLES LIKE 'openshift_storage_usage_line_items_daily'
                    ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/koku/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 479, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^

the ONPREM env variable is set for True for this deployment.


Edit: Issue was solved by rebuilding the image. seems the image was outdated and didn't contain the latest state of this PR.

@@ -0,0 +1,154 @@
CREATE TABLE IF NOT EXISTS {{schema | sqlsafe}}.managed_aws_openshift_daily_temp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So technically the SQL directory is also postgres_sql. I think instead of postgres_sql as the directory name can we change it to on_site since this is SQL that will be running outside of the SASS moving forward.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. though, the change is about postgres instead of trino. we do not have to tie it to on-site/on-prem

@myersCody myersCody changed the title trino2postgres Replace trino sql with postgresql on site SQL for on premise Jan 9, 2026
@ydayagi
Copy link
Collaborator Author

ydayagi commented Jan 11, 2026

@ydayagi When ONPREM=true, the application switches to PostgreSQL queries but still creates the hive database through Django migrations. This suggests the hive infrastructure may be unnecessarily provisioned in on-premise deployments where it's not actively used. Can you confirm hive DB isn't necessary for the ONPREM and if so, skip its creation and running migrations for it?

hive and django were never used together and Django is unaware of hive. django db api is only used for postgres. after my changes, the use of hive and postgres via trino is replaced by using django. even if u see the word hive in log, it is not really hive. it is just the log message that was not changed

@ydayagi
Copy link
Collaborator Author

ydayagi commented Jan 11, 2026

So the initial plan for on-prem was OCP data only to start with. Is there a need to update all these cloud provider parts too (AWS, Azure, GCP)?

I'm concerned we are trying to change too much in one go rather than incremental steps. I'd rather see OCP work exclusively first before we try anything beyond that. It would also help during the review process too since it would be more focused.

In addition to that were you able to run the full IQE integration tests on this locally and get passes? That's the first step I suggest we take, get the tests running and see what results we can get compared to main.

it was easier to convert everything. i think that the tests (unit , iqe) do not have a 100% ocp only set. in addition to that, i was told we are going to do ocp on cloud as well. otherwise, what's the point in cost mgmt? and if it is not goign to be used, then no need to review or test. but reverting it would be a great .
i tested with IQE. when i chose the test_api_ocp* tests it also tests gcp, azure and aws. all test_api_ocp_ingest* tests passed

@ydayagi
Copy link
Collaborator Author

ydayagi commented Jan 11, 2026

@ydayagi testing the image with the on-prem chart raises an error in the cost-onprem-celery-worker-ocp deployment:

[2026-01-06 11:24:55,954] INFO 68869c80-7a53-4593-a89f-f71afb519d08 68869c80-7a53-4593-a89f-f71afb519d08 d198b16e-f22a-4520-9b87-5e5a2dc44160 67 setting summary ManifestState.FAILED for manifest: 3
[2026-01-06 11:24:55,963] ERROR 68869c80-7a53-4593-a89f-f71afb519d08 68869c80-7a53-4593-a89f-f71afb519d08 d198b16e-f22a-4520-9b87-5e5a2dc44160 67 Task failed: syntax error at or near "LIKE"
LINE 1: SHOW TABLES LIKE 'openshift_storage_usage_line_items_daily'
                    ^
Traceback (most recent call last):
  File "/opt/koku/.venv/lib/python3.11/site-packages/django/db/backends/utils.py", line 87, in _execute
    return self.cursor.execute(sql)
           ^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.errors.SyntaxError: syntax error at or near "LIKE"
LINE 1: SHOW TABLES LIKE 'openshift_storage_usage_line_items_daily'
                    ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/koku/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 479, in trace_task
    R = retval = fun(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^

the ONPREM env variable is set for True for this deployment.

Edit: Issue was solved by rebuilding the image. seems the image was outdated and didn't contain the latest state of this PR.

sorry about that. i had a few fixes missing in the image

FLPATH-2957
https://issues.redhat.com/browse/FLPATH-2957
- check_table_exists(): Use get_table_check_sql() instead of SHOW TABLES
- find_expired_partitions(): Use get_expired_data_ocp_sql() instead of $partitions metadata query with date_parse()
- drop_expired_partitions(): Use get_delete_by_month_sql() instead of hardcoded DELETE with hive. prefix
When ONPREM is True, use MockUnleashClient instead of real UnleashClient
to avoid dependency on Unleash service.
@ydayagi ydayagi force-pushed the trino2postgres branch 5 times, most recently from 31abca3 to 14a37ce Compare January 14, 2026 15:43
@codecov
Copy link

codecov bot commented Jan 14, 2026

Codecov Report

❌ Patch coverage is 60.63830% with 222 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.7%. Comparing base (0bdb1ce) to head (aa53670).
⚠️ Report is 1 commits behind head on main.

❌ Your patch check has failed because the patch coverage (60.6%) is below the target coverage (90.0%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #5813     +/-   ##
=======================================
- Coverage   94.3%   93.7%   -0.6%     
=======================================
  Files        345     348      +3     
  Lines      29740   30155    +415     
  Branches    3239    3277     +38     
=======================================
+ Hits       28054   28260    +206     
- Misses      1098    1298    +200     
- Partials     588     597      +9     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants