fix(utc): move DB config override to _apply_driver_defaults#198
Conversation
* In FlaskSQLAlchemy v3, apply_driver_hacks was renamed to _apply_driver_defaults and the signature slightly changed, as well as the return value removed/ignored. Until now, we had not renamed our override of apply_driver_hacks, so it was not being called at all. I renamed our override and changed the signature to be compatible. * Moved the `timezone` command-line argument override to _apply_driver_defaults such that it only applies when a PostgreSQL DB is being used.
invenio_db/shared.py
Outdated
| def apply_driver_hacks(self, app, sa_url, options): | ||
| def _apply_driver_defaults(self, options, app): | ||
| """Call before engine creation.""" | ||
| # Don't forget to apply hacks defined on parent object. |
There was a problem hiding this comment.
minor: you could update the comment since the method name has changed. hacks -> defaults
slint
left a comment
There was a problem hiding this comment.
LGTM, one minor thing about being a bit defensive in case connect_args is already set for some reason with a difference value.
| converters.conversions[LocalProxy] = escape_local_proxy | ||
| converters.encoders[LocalProxy] = escape_local_proxy | ||
|
|
||
| return sa_url, options |
There was a problem hiding this comment.
Yikes, I thought for a minute, we're not actually applying anything, but it looks like options is an input-output argument now 👀
There was a problem hiding this comment.
as i understand Pal, we are not applying anything, since the name has been changed and the method is not called!
There was a problem hiding this comment.
Indeed we don't need to return anything anymore, it uses options as a pass-by-reference value as can be seen in the default implementation of the method. This was changed when they upgraded it from 2.5 -> 3.0.
invenio_db/shared.py
Outdated
| from psycopg2.extensions import adapt, register_adapter | ||
|
|
||
| connect_args = options.setdefault("connect_args", {}) | ||
| if "options" not in connect_args: |
There was a problem hiding this comment.
minor: should we be a bit "defensive" and log a warning in case this is already set with potentially other values? I'm not sure what the lifecycle of connect_args is, and if our timezone patch is not applied, one would have a hard time figuring out where/why this is not happening...
There was a problem hiding this comment.
is a warning enough in such a case?
There was a problem hiding this comment.
Good point, I agree it would be a very difficult issue to debug if an instance was overriding this and didn't notice the change. A warning should probably be enough since we still want to allow instances to override if needed, so we shouldn't raise an exception
There was a problem hiding this comment.
I see two approaches (in case options is already set):
- A) we're "intrusive" and actually fix the already set
optionsvalue and append the-c timezone=UTCstring. on one hand one might argue that since the way we've built Invenio now, UTC timezone in Postgres is a "hard requirement" and thus without it you're basically running a "broken instance" - B) we fail hard here with an exception...
I'm not decided yet on if in either of the above approaches, we should also provide a way out for people to supress the behavior...
There was a problem hiding this comment.
forgot to mention on Zenodo e.g. we set application_name to the hostname of the client using SQLALCHEMY_ENGINE_OPTIONS, which I haven't tested if it fails... we're doing this because we're using Pgbouncer though, so that we can know better whcih clients connect to it: https://github.com/zenodo/zenodo-rdm/blob/72906255c1970970984af3cf2b4bc6f93bb87687/invenio.cfg#L149-L151
There was a problem hiding this comment.
The Zenodo config should be fine since we're specifically checking for the "options" key of connect_args here (corresponding to the libpq options key, not to be confused with the options parameter passed into the _apply_driver_defaults method) and only overriding that. So application_name would stay in connect_args without any changes.
In terms of the approaches, I think (A) might potentially be risky since it relies on us checking that the timezone isn't already in options. There are multiple ways to include it and a potentially infinite set of valid values that all mean UTC. It is indeed a hard requirement so trying to set any other timezone is wrong, but I think maybe it's more reliable to just give a warning message if we see any timezone being set.
There was a problem hiding this comment.
I'm afraid that a warning will be difficult for anyone running this on production systems to see (since we're not usually paying attention at logs)... On the other hand, since this is something that will be introduced in InvenioRDM v14, maybe we just add it to the migration guide, so that folks make sure to check in their environment, e.g., by just running any invenio ... command that accesses the DB.
I would say, let's go with warning and @utnapischtim we should add a note in the v14 upgrade notes.
There was a problem hiding this comment.
I've added the warning
|
when i use this on invenio-pidstore i get multiple please wait until i found out what the problem is |
|
at least for invenio-pidstore this doesn't work. https://github.com/inveniosoftware/invenio-pidstore/actions/runs/22713086766/job/65855753343?pr=178 but the problem is mostly that we reactivated the the not used driver_defaults function which for sqlite doesn't work |
|
@utnapischtim I have pushed a commit to hopefully fix this issue, it seems to be running both a PostgreSQL and SQLite engine instance during the pidstore unit tests, and it gets confused by trying to use the same |
invenio_db/shared.py
Outdated
| # Enable foreign key constraint checking | ||
| # In some unit tests, we might be using multiple engines with different DBs, and we want to | ||
| # make 100% sure this command only runs for sqlite. | ||
| if not dbapi_connection.__class__.__module__.startswith("sqlite3"): |
There was a problem hiding this comment.
personally i don't like lines which use __class__.__module__ constructs. i worry that this is not future stable
we could add in line 164 following code:
if event.contains(Engine, "connect", do_sqlite_connect):
event.remove(Engine, "connect", do_sqlite_connect)
if event.contains(Engine, "begin", do_sqlite_begin):
event.remove(Engine, "begin", do_sqlite_begin)
which removes this function if it has been added before.
i think this will only be a problem in the tests.
There was a problem hiding this comment.
This might create a race condition:
- The engine is initialised with SQLite
- The engine is initialised with PostgreSQL, thereby deleting the SQLite event listener
- PostgreSQL manages to connect first (unlikely but possible)
- SQLite connects second but there's no listener, so the foreign key pragma is not set
- SQLite operates without relationship enforcement
As you say, this is a problem unique to unit tests since we don't use SQLite anywhere else, but this could still cause the tests to potentially fail on random occasions.
There was a problem hiding this comment.
yes, if the tests execution will run in parallel, it could be a problem with a race condition, since event is a global construct!
|
After seeing a bit all the discussions with messing with the # or whatever is the best method
is_postgres = app.config.get("SQLALCHEMY_DATABASE_URI").starswith("postgres://")
if is_postgres:
app.config.setdefault("SQLALCHEMY_ENGINE_OPTIONS", ...)I would prefer if we could actually remove these hacks, since they might be handled already in Flask-SQLAlchemy. |
|
@slint I have added a new commit removing the driver hacks and moving the default config to With this, unit tests are now passing across all Invenio modules: https://palkerecsenyi.github.io/invenio-testrig-client/invenio-testrig-2026-03-12-13-18-00/index.html |
In FlaskSQLAlchemy v3, apply_driver_hacks was renamed to _apply_driver_defaults and the signature slightly changed, as well as the return value removed/ignored. Until now, we had not renamed our override of apply_driver_hacks, so it was not being called at all. I renamed our override and changed the signature to be compatible.
Moved the
timezonecommand-line argument override to _apply_driver_defaults such that it only applies when a PostgreSQL DB is being used.This should fix the issue in the unit tests in inveniosoftware/invenio-pidstore#178, I will test this.