Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 0 additions & 7 deletions invenio_db/ext.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,13 +146,6 @@ def init_db(self, app, entry_point_group="invenio_db.models", **kwargs):
app.config.setdefault("SQLALCHEMY_ECHO", False)
# Needed for before/after_flush/commit/rollback events
app.config.setdefault("SQLALCHEMY_TRACK_MODIFICATIONS", True)
app.config.setdefault(
"SQLALCHEMY_ENGINE_OPTIONS",
# Ensure the database is using the UTC timezone for interpreting timestamps (Postgres only).
# This overrides any default setting (e.g. in postgresql.conf). Invenio expects the DB to receive
# and provide UTC timestamps in all cases, so it's important that this doesn't get changed.
{"connect_args": {"options": "-c timezone=UTC"}},
)

# Initialize Flask-SQLAlchemy extension.
database = kwargs.get("db", db)
Expand Down
38 changes: 32 additions & 6 deletions invenio_db/shared.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,13 @@

"""Shared database object for Invenio."""

import re
import warnings
from datetime import datetime, timezone

from flask_sqlalchemy import SQLAlchemy as FlaskSQLAlchemy
from sqlalchemy import Column, MetaData, event, util
from sqlalchemy.engine import Engine
from sqlalchemy.engine import Engine, make_url
from sqlalchemy.sql import text
from sqlalchemy.types import DateTime, TypeDecorator
from werkzeug.local import LocalProxy
Expand Down Expand Up @@ -129,10 +131,12 @@ def __getattr__(self, name):

return super().__getattr__(name)

def apply_driver_hacks(self, app, sa_url, options):
def _apply_driver_defaults(self, options, app):
"""Call before engine creation."""
# Don't forget to apply hacks defined on parent object.
super(SQLAlchemy, self).apply_driver_hacks(app, sa_url, options)
# Don't forget to apply defaults defined on parent object.
super(SQLAlchemy, self)._apply_driver_defaults(options, app)

sa_url = make_url(options["url"])

if sa_url.drivername == "sqlite":
connect_args = options.setdefault("connect_args", {})
Expand All @@ -158,6 +162,30 @@ def adapt_proxy(proxy):
elif sa_url.drivername == "postgresql+psycopg2": # pragma: no cover
from psycopg2.extensions import adapt, register_adapter

connect_args = options.setdefault("connect_args", {})
options_override = "-c timezone=UTC"
if "options" not in connect_args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: should we be a bit "defensive" and log a warning in case this is already set with potentially other values? I'm not sure what the lifecycle of connect_args is, and if our timezone patch is not applied, one would have a hard time figuring out where/why this is not happening...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is a warning enough in such a case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I agree it would be a very difficult issue to debug if an instance was overriding this and didn't notice the change. A warning should probably be enough since we still want to allow instances to override if needed, so we shouldn't raise an exception

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two approaches (in case options is already set):

  • A) we're "intrusive" and actually fix the already set options value and append the -c timezone=UTC string. on one hand one might argue that since the way we've built Invenio now, UTC timezone in Postgres is a "hard requirement" and thus without it you're basically running a "broken instance"
  • B) we fail hard here with an exception...

I'm not decided yet on if in either of the above approaches, we should also provide a way out for people to supress the behavior...

Copy link
Member

@slint slint Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to mention on Zenodo e.g. we set application_name to the hostname of the client using SQLALCHEMY_ENGINE_OPTIONS, which I haven't tested if it fails... we're doing this because we're using Pgbouncer though, so that we can know better whcih clients connect to it: https://github.com/zenodo/zenodo-rdm/blob/72906255c1970970984af3cf2b4bc6f93bb87687/invenio.cfg#L149-L151

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Zenodo config should be fine since we're specifically checking for the "options" key of connect_args here (corresponding to the libpq options key, not to be confused with the options parameter passed into the _apply_driver_defaults method) and only overriding that. So application_name would stay in connect_args without any changes.

In terms of the approaches, I think (A) might potentially be risky since it relies on us checking that the timezone isn't already in options. There are multiple ways to include it and a potentially infinite set of valid values that all mean UTC. It is indeed a hard requirement so trying to set any other timezone is wrong, but I think maybe it's more reliable to just give a warning message if we see any timezone being set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that a warning will be difficult for anyone running this on production systems to see (since we're not usually paying attention at logs)... On the other hand, since this is something that will be introduced in InvenioRDM v14, maybe we just add it to the migration guide, so that folks make sure to check in their environment, e.g., by just running any invenio ... command that accesses the DB.

I would say, let's go with warning and @utnapischtim we should add a note in the v14 upgrade notes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the warning

# Ensure the database is using the UTC timezone for interpreting timestamps (PostgreSQL only).
# This overrides any default setting (e.g. in postgresql.conf). Invenio expects the DB to receive
# and provide UTC timestamps in all cases, so it's important that this doesn't get changed.
connect_args["options"] = options_override
elif (
# Check that the exact correct timezone override is not in the existing `options`.
# The regex simply checks that the override is either at the end of the string or has
# a space after it. Otherwise, a value like `-c timezone=UTC+3` would still match.
# If the app is in dev mode and is auto-reloading, the correct timezone will have been added
# already by the code above, so we want to avoid showing a warning.
not re.search(
rf"{re.escape(options_override)}( |$)", connect_args["options"]
)
):
warnings.warn(
"It looks like you are manually passing command line options to libpq (via `connect_args.options` in `SQLALCHEMY_ENGINE_OPTIONS`). "
"To avoid unexpected behaviour, InvenioDB won't add an override to these options to set the time zone to UTC. "
"Please note that PostgreSQL databases used with Invenio must be in UTC. If your database or connection is configured with a non-UTC "
"timezone, please change this before continuing to avoid unexpected behaviour."
)

def adapt_proxy(proxy):
"""Get current object and try to adapt it again."""
return adapt(proxy._get_current_object())
Expand All @@ -178,8 +206,6 @@ def escape_local_proxy(val, mapping):
converters.conversions[LocalProxy] = escape_local_proxy
converters.encoders[LocalProxy] = escape_local_proxy

return sa_url, options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, I thought for a minute, we're not actually applying anything, but it looks like options is an input-output argument now 👀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as i understand Pal, we are not applying anything, since the name has been changed and the method is not called!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we don't need to return anything anymore, it uses options as a pass-by-reference value as can be seen in the default implementation of the method. This was changed when they upgraded it from 2.5 -> 3.0.

https://github.com/pallets-eco/flask-sqlalchemy/blob/168cb4b7b50fe5176307a10d873781bfafc6eeda/src/flask_sqlalchemy/extension.py#L578-L645



def do_sqlite_connect(dbapi_connection, connection_record):
"""Ensure SQLite checks foreign key constraints.
Expand Down
Loading