Skip to content

fix(utc): move DB config override to _apply_driver_defaults#198

Merged
utnapischtim merged 4 commits intoinveniosoftware:masterfrom
palkerecsenyi:config-override-postgres-only
Mar 12, 2026
Merged

fix(utc): move DB config override to _apply_driver_defaults#198
utnapischtim merged 4 commits intoinveniosoftware:masterfrom
palkerecsenyi:config-override-postgres-only

Conversation

@palkerecsenyi
Copy link
Member

  • In FlaskSQLAlchemy v3, apply_driver_hacks was renamed to _apply_driver_defaults and the signature slightly changed, as well as the return value removed/ignored. Until now, we had not renamed our override of apply_driver_hacks, so it was not being called at all. I renamed our override and changed the signature to be compatible.

  • Moved the timezone command-line argument override to _apply_driver_defaults such that it only applies when a PostgreSQL DB is being used.

This should fix the issue in the unit tests in inveniosoftware/invenio-pidstore#178, I will test this.

* In FlaskSQLAlchemy v3, apply_driver_hacks was renamed to
_apply_driver_defaults and the signature slightly changed, as well as
the return value removed/ignored. Until now, we had not renamed our
override of apply_driver_hacks, so it was not being called at all. I
renamed our override and changed the signature to be compatible.

* Moved the `timezone` command-line argument override to
_apply_driver_defaults such that it only applies when a PostgreSQL DB is
being used.
@palkerecsenyi palkerecsenyi marked this pull request as ready for review March 4, 2026 11:02
@palkerecsenyi palkerecsenyi requested a review from zzacharo March 4, 2026 11:02
@palkerecsenyi palkerecsenyi moved this to In review 🔍 in Sprint Q1/2026 Mar 4, 2026
def apply_driver_hacks(self, app, sa_url, options):
def _apply_driver_defaults(self, options, app):
"""Call before engine creation."""
# Don't forget to apply hacks defined on parent object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: you could update the comment since the method name has changed. hacks -> defaults

Copy link
Member

@slint slint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one minor thing about being a bit defensive in case connect_args is already set for some reason with a difference value.

converters.conversions[LocalProxy] = escape_local_proxy
converters.encoders[LocalProxy] = escape_local_proxy

return sa_url, options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, I thought for a minute, we're not actually applying anything, but it looks like options is an input-output argument now 👀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as i understand Pal, we are not applying anything, since the name has been changed and the method is not called!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we don't need to return anything anymore, it uses options as a pass-by-reference value as can be seen in the default implementation of the method. This was changed when they upgraded it from 2.5 -> 3.0.

https://github.com/pallets-eco/flask-sqlalchemy/blob/168cb4b7b50fe5176307a10d873781bfafc6eeda/src/flask_sqlalchemy/extension.py#L578-L645

from psycopg2.extensions import adapt, register_adapter

connect_args = options.setdefault("connect_args", {})
if "options" not in connect_args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: should we be a bit "defensive" and log a warning in case this is already set with potentially other values? I'm not sure what the lifecycle of connect_args is, and if our timezone patch is not applied, one would have a hard time figuring out where/why this is not happening...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a warning enough in such a case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I agree it would be a very difficult issue to debug if an instance was overriding this and didn't notice the change. A warning should probably be enough since we still want to allow instances to override if needed, so we shouldn't raise an exception

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two approaches (in case options is already set):

  • A) we're "intrusive" and actually fix the already set options value and append the -c timezone=UTC string. on one hand one might argue that since the way we've built Invenio now, UTC timezone in Postgres is a "hard requirement" and thus without it you're basically running a "broken instance"
  • B) we fail hard here with an exception...

I'm not decided yet on if in either of the above approaches, we should also provide a way out for people to supress the behavior...

Copy link
Member

@slint slint Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to mention on Zenodo e.g. we set application_name to the hostname of the client using SQLALCHEMY_ENGINE_OPTIONS, which I haven't tested if it fails... we're doing this because we're using Pgbouncer though, so that we can know better whcih clients connect to it: https://github.com/zenodo/zenodo-rdm/blob/72906255c1970970984af3cf2b4bc6f93bb87687/invenio.cfg#L149-L151

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Zenodo config should be fine since we're specifically checking for the "options" key of connect_args here (corresponding to the libpq options key, not to be confused with the options parameter passed into the _apply_driver_defaults method) and only overriding that. So application_name would stay in connect_args without any changes.

In terms of the approaches, I think (A) might potentially be risky since it relies on us checking that the timezone isn't already in options. There are multiple ways to include it and a potentially infinite set of valid values that all mean UTC. It is indeed a hard requirement so trying to set any other timezone is wrong, but I think maybe it's more reliable to just give a warning message if we see any timezone being set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that a warning will be difficult for anyone running this on production systems to see (since we're not usually paying attention at logs)... On the other hand, since this is something that will be introduced in InvenioRDM v14, maybe we just add it to the migration guide, so that folks make sure to check in their environment, e.g., by just running any invenio ... command that accesses the DB.

I would say, let's go with warning and @utnapischtim we should add a note in the v14 upgrade notes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the warning

@utnapischtim
Copy link
Contributor

when i use this on invenio-pidstore i get multiple sqlalchemy.exc.ProgrammingError: (psycopg2.errors.SyntaxError) syntax error at or near "PRAGMA" errors

please wait until i found out what the problem is

@utnapischtim
Copy link
Contributor

@palkerecsenyi

at least for invenio-pidstore this doesn't work.

https://github.com/inveniosoftware/invenio-pidstore/actions/runs/22713086766/job/65855753343?pr=178

but the problem is mostly that we reactivated the the not used driver_defaults function which for sqlite doesn't work

@palkerecsenyi
Copy link
Member Author

@utnapischtim I have pushed a commit to hopefully fix this issue, it seems to be running both a PostgreSQL and SQLite engine instance during the pidstore unit tests, and it gets confused by trying to use the same connect event for both of them.

# Enable foreign key constraint checking
# In some unit tests, we might be using multiple engines with different DBs, and we want to
# make 100% sure this command only runs for sqlite.
if not dbapi_connection.__class__.__module__.startswith("sqlite3"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally i don't like lines which use __class__.__module__ constructs. i worry that this is not future stable

we could add in line 164 following code:

            if event.contains(Engine, "connect", do_sqlite_connect):
                event.remove(Engine, "connect", do_sqlite_connect)
            if event.contains(Engine, "begin", do_sqlite_begin):
                event.remove(Engine, "begin", do_sqlite_begin)

which removes this function if it has been added before.

i think this will only be a problem in the tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might create a race condition:

  1. The engine is initialised with SQLite
  2. The engine is initialised with PostgreSQL, thereby deleting the SQLite event listener
  3. PostgreSQL manages to connect first (unlikely but possible)
  4. SQLite connects second but there's no listener, so the foreign key pragma is not set
  5. SQLite operates without relationship enforcement

As you say, this is a problem unique to unit tests since we don't use SQLite anywhere else, but this could still cause the tests to potentially fail on random occasions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, if the tests execution will run in parallel, it could be a problem with a race condition, since event is a global construct!

@slint
Copy link
Member

slint commented Mar 6, 2026

After seeing a bit all the discussions with messing with the _apply_driver_{hacks,defaults}, does it maybe make sense to take a step back and just handle this in the init_db config defaults setting like so:

# or whatever is the best method
is_postgres = app.config.get("SQLALCHEMY_DATABASE_URI").starswith("postgres://")
if is_postgres:  
    app.config.setdefault("SQLALCHEMY_ENGINE_OPTIONS", ...)

I would prefer if we could actually remove these hacks, since they might be handled already in Flask-SQLAlchemy.

@palkerecsenyi palkerecsenyi moved this from In review 🔍 to In progress in Sprint Q1/2026 Mar 12, 2026
@palkerecsenyi palkerecsenyi self-assigned this Mar 12, 2026
@palkerecsenyi
Copy link
Member Author

@slint I have added a new commit removing the driver hacks and moving the default config to ext.py, which helps avoid any errors with SQLite tests in the docs.

With this, unit tests are now passing across all Invenio modules: https://palkerecsenyi.github.io/invenio-testrig-client/invenio-testrig-2026-03-12-13-18-00/index.html

@utnapischtim utnapischtim merged commit 71cafb6 into inveniosoftware:master Mar 12, 2026
3 checks passed
@github-project-automation github-project-automation bot moved this from In progress to To release 🤖 in Sprint Q1/2026 Mar 12, 2026
@palkerecsenyi palkerecsenyi deleted the config-override-postgres-only branch March 16, 2026 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: To release 🤖

Development

Successfully merging this pull request may close these issues.

4 participants