-
Notifications
You must be signed in to change notification settings - Fork 14
Logging context #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Logging context #245
Changes from 9 commits
00685aa
3aefc4b
1c34af7
e60b16a
ecce1b9
fd06cb8
5e007eb
8eea4da
84144c8
4448637
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -153,3 +153,5 @@ dev/cleanup.py | |
|
|
||
| .python-version | ||
| .databricks-login.json | ||
|
|
||
| .test_*.ipynb | ||
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,179 @@ | ||||||||||
| """internall plumbing for passing logging context (dict) to logger instances""" | ||||||||||
|
|
||||||||||
| import dataclasses | ||||||||||
| import inspect | ||||||||||
| import logging | ||||||||||
| from concurrent.futures import ThreadPoolExecutor | ||||||||||
| from contextlib import contextmanager | ||||||||||
| from contextvars import ContextVar | ||||||||||
| from functools import partial, wraps | ||||||||||
| from types import MappingProxyType | ||||||||||
| from typing import TYPE_CHECKING, Annotated, Any, TypeVar, get_origin | ||||||||||
|
|
||||||||||
| if TYPE_CHECKING: | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In general we don't use I think this code will work fine without the guard? |
||||||||||
| T = TypeVar("T") | ||||||||||
| # SkipLogging[list[str]] will be treated by type checkers as list[str], because that's what Annotated is on runtime | ||||||||||
| # if this workaround is not in place, caller will complain about having wrong typing | ||||||||||
| # https://github.com/python/typing/discussions/1229 | ||||||||||
| SkipLogging = Annotated[T, ...] | ||||||||||
| else: | ||||||||||
|
|
||||||||||
| @dataclasses.dataclass(slots=True) | ||||||||||
| class SkipLogging: | ||||||||||
| """`@logging_context_params` will ignore parameters annotated with this class.""" | ||||||||||
|
|
||||||||||
| def __class_getitem__(cls, item: Any) -> Any: | ||||||||||
| return Annotated[item, SkipLogging()] | ||||||||||
|
|
||||||||||
|
|
||||||||||
| _CTX: ContextVar = ContextVar("ctx", default={}) | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The name of the variable and the first parameter are supposed to match, for example:
Suggested change
That said, there's a bigger safety issue we need to think about here, because the dictionary holding the context is mutable. This leads to some weird consequences:
From a practical point of view, I can see most of the code is avoiding this issue by creating new dictionaries, but there's a path to problems via Some things that would help here:
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [Later note: I've thought about this some more, and also think it's safest to have no default value, but rather use
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good catch with that, let me handle default internally and now worry about {} |
||||||||||
|
|
||||||||||
|
|
||||||||||
| def _params_str(params: dict[str, Any]): | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| return ", ".join(f"{k}={v!r}" for k, v in params.items()) | ||||||||||
|
|
||||||||||
|
|
||||||||||
| def _get_skip_logging_param_names(sig: inspect.Signature): | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please annotate the return-type. |
||||||||||
| """Generates list of parameters names having SkipLogging annotation""" | ||||||||||
| for name, param in sig.parameters.items(): | ||||||||||
| ann = param.annotation | ||||||||||
|
|
||||||||||
| # only consider annotation | ||||||||||
| if not ann or get_origin(ann) is not Annotated: | ||||||||||
| continue | ||||||||||
|
|
||||||||||
| # there can be many annotations for each param | ||||||||||
| for meta in ann.__metadata__: | ||||||||||
| # type checker thinks SkipLogging is a generic, despite it being Annotated | ||||||||||
| if meta and isinstance(meta, SkipLogging): # type: ignore | ||||||||||
| yield name | ||||||||||
|
|
||||||||||
|
|
||||||||||
| def _skip_dict_key(params: dict, keys_to_skip: set): | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we type-hint this properly, please? |
||||||||||
| return {k: v for k, v in params.items() if k not in keys_to_skip} | ||||||||||
|
|
||||||||||
|
|
||||||||||
| def current_context(): | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Once |
||||||||||
| """Returns dictionary of current context set via `with loggin_context(...)` context manager or `@logging_context_params` decorator | ||||||||||
| Example: | ||||||||||
| current_context() | ||||||||||
| >>> {'foo': 'bar', 'a': 2} | ||||||||||
| """ | ||||||||||
| return _CTX.get() | ||||||||||
|
Comment on lines
+56
to
+64
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed elsewhere, this is currently problematic because it's a public API and returns mutable (internal state). The logging facility is often relied upon as a source of truth when debugging confusing and unusual situations, so this really needs to be handled in a defensive manner so that we know it's not (itself) a further source of problems. (This is definitely fixable, but I think it's sufficiently important to emphasise why it should be addressed.) |
||||||||||
|
|
||||||||||
|
|
||||||||||
| def current_context_repr(): | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this used anywhere? I don't see any references to it, maybe left over from debugging? |
||||||||||
| """Returns repr like "key1=val1, key2=val2" string representation of current_context(), or "" in case context is empty""" | ||||||||||
| return _params_str(current_context()) | ||||||||||
|
|
||||||||||
|
|
||||||||||
| @contextmanager | ||||||||||
| def logging_context(**kwds): | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Type-hint please, for the return-type. (Without this the body isn't checked.) |
||||||||||
| """Context manager adding keywords to current loging context. Thread and async safe. | ||||||||||
| Example: | ||||||||||
| with logging_context(foo="bar", a=2): | ||||||||||
| logger.info("hello") | ||||||||||
| >>> 2025-06-06 07:15:09,329 - __main__ - INFO - hello (foo='bar', a=2) | ||||||||||
| """ | ||||||||||
| # Get the current context and update it with new keywords | ||||||||||
| current_ctx = _CTX.get() | ||||||||||
| new_ctx = {**current_ctx, **kwds} | ||||||||||
| token = _CTX.set(MappingProxyType(new_ctx)) | ||||||||||
| try: | ||||||||||
| yield _CTX.get() | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm. This is another place where we expose the (mutable) internal state. Stepping back a bit, does it need to yield anything? (I have an open mind here… I don't see why users would want to capture the current context, but also don't see any reason to prevent it, assuming we can do it safely.) I suspect it should either be:
Suggested change
(If there's no real use-case.) or
Suggested change
(If we want to expose it, it needs to be done safely.) |
||||||||||
| except Exception as e: | ||||||||||
| # python 3.11+: https://docs.python.org/3.11/tutorial/errors.html#enriching-exceptions-with-notes | ||||||||||
| # https://peps.python.org/pep-0678/ | ||||||||||
| if hasattr(e, "add_note"): | ||||||||||
| # __notes__ list[str] is only defined if notes were added, otherwise is not there | ||||||||||
| # we only want to add note if there was no note before, otherwise chaining context cause sproblems | ||||||||||
| if not getattr(e, "__notes__", None): | ||||||||||
| e.add_note(f"Context: {_params_str(current_context())}") | ||||||||||
|
|
||||||||||
| raise | ||||||||||
| finally: | ||||||||||
| _CTX.reset(token) | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is good to see, although bear in mind my earlier comments that the |
||||||||||
|
|
||||||||||
|
|
||||||||||
| def logging_context_params(func=None, **extra_context): | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Type-hints, please. ;) |
||||||||||
| """Decorator that automatically adds all the function parameters to current logging context. | ||||||||||
| Any passed keyward arguments in will be added to the context. Function parameters take precendnce over the extra keywords in case the names would overlap. | ||||||||||
| Parameters annotated with `SkipLogging` will be ignored from being added to the context. | ||||||||||
| Example: | ||||||||||
| @logging_context_params(foo="bar") | ||||||||||
| def do_math(a: int, b: SkipLogging[int]): | ||||||||||
| r = pow(a, b) | ||||||||||
| logger.info(f"result of {a}**{b} is {r}") | ||||||||||
| return r | ||||||||||
| >>> 2025-06-06 07:15:09,329 - __main__ - INFO - result of 2**8 is 256 (foo='bar', a=2) | ||||||||||
| Note: | ||||||||||
| - `a` parameter will be logged, type annotation is optional | ||||||||||
| - `b` parameter wont be logged because is it is annotated with `SkipLogging` | ||||||||||
| - `foo` parameter will be logged because it is passed as extra context to the decorator | ||||||||||
| """ | ||||||||||
|
|
||||||||||
| if func is None: | ||||||||||
| return partial(logging_context_params, **extra_context) | ||||||||||
|
|
||||||||||
| # will use function's singature to bind positional params to name of the param | ||||||||||
| sig = inspect.signature(func) | ||||||||||
| skip_params = set(_get_skip_logging_param_names(sig)) | ||||||||||
|
|
||||||||||
| @wraps(func) | ||||||||||
| def wrapper(*args, **kwds): | ||||||||||
| # only bind if there are positional args | ||||||||||
| # extra context has lower priority than any of the args | ||||||||||
| # skip_params is used to filter out parameters that are annotated with SkipLogging | ||||||||||
|
|
||||||||||
| if args: | ||||||||||
| bound = sig.bind(*args, **kwds) | ||||||||||
| ctx_data = {**extra_context, **_skip_dict_key(bound.arguments, skip_params)} | ||||||||||
| else: | ||||||||||
| ctx_data = {**extra_context, **_skip_dict_key(kwds, skip_params)} | ||||||||||
|
|
||||||||||
| with logging_context(**ctx_data): | ||||||||||
| return func(*args, **kwds) | ||||||||||
|
|
||||||||||
| return wrapper | ||||||||||
|
|
||||||||||
|
|
||||||||||
| class LoggingContextInjectingFilter(logging.Filter): | ||||||||||
| """Adds current_context() to the log record.""" | ||||||||||
|
|
||||||||||
| def filter(self, record): | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Type-hints, please. |
||||||||||
| # https://docs.python.org/3/howto/logging-cookbook.html#using-filters-to-impart-contextual-information | ||||||||||
| # https://docs.python.org/3/howto/logging-cookbook.html#use-of-contextvars | ||||||||||
| ctx = current_context() | ||||||||||
| record.context = f"{_params_str(ctx)}" if ctx else "" | ||||||||||
| record.context_msg = f" ({record.context})" if record.context else "" | ||||||||||
|
Comment on lines
+157
to
+158
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm curious why we're attaching the context rendered as a string? I would think attaching it as a dictionary is preferable? Maybe something like:
Suggested change
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've put it as string to remove the quotes around name of key, it takes less space this way and feels easier to read: d = { 'process': 'a', 'name': 'Alice', 'priority': 70}
# {'process': 'a', 'name': 'Alice', 'priority': 70}
print(d)
# {'process': 'a', 'name': 'Alice', 'priority': 70}
print(repr(d))
# process = 'a', name = 'Alice', priority = 70
print(", ".join(f"{k} = {v!r}" for k, v in d.items()))but all in all, I don't have strong opinion |
||||||||||
| return True | ||||||||||
|
|
||||||||||
|
|
||||||||||
| class LoggingThreadPoolExecutor(ThreadPoolExecutor): | ||||||||||
| """ThreadPoolExecutor drop in replacement that will apply current loging context to all new started threads.""" | ||||||||||
|
|
||||||||||
| def __init__(self, max_workers=None, thread_name_prefix="", initializer=None, initargs=()): | ||||||||||
| self.__current_context = current_context() | ||||||||||
| self.__wrapped_initializer = initializer | ||||||||||
|
|
||||||||||
| super().__init__( | ||||||||||
| max_workers=max_workers, | ||||||||||
| thread_name_prefix=thread_name_prefix, | ||||||||||
| initializer=self._logging_context_init, | ||||||||||
| initargs=initargs, | ||||||||||
| ) | ||||||||||
|
|
||||||||||
| def _logging_context_init(self, *args): | ||||||||||
| _CTX.set(self.__current_context) | ||||||||||
| if self.__wrapped_initializer: | ||||||||||
| self.__wrapped_initializer(*args) | ||||||||||
|
Comment on lines
+162
to
+179
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was mildly surprised that we need this, I thought that the With this in mind, you're correct that context variables aren't properly transferred into execution contexts in other threads. I suspect this is partly because it's not always clear at which point the context should be preferred. For example, using a basic Returning to the situation at hand: I'd like to see this implemented for
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would prefer to patch this in this PR, otherwise it's quite hard to prove usefulness of this feature when we don't have tests for multithreading, and only then I see benefits of using this feature. On non multithreading code, just reading top down logs is enough to understand context. I can make it working in submit(), will update code soon. |
||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -4,6 +4,23 @@ | |||||||||||||||||||||||||||
| import sys | ||||||||||||||||||||||||||||
| from typing import TextIO | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| from ._logging_context import ( | ||||||||||||||||||||||||||||
| LoggingContextInjectingFilter, | ||||||||||||||||||||||||||||
| SkipLogging, | ||||||||||||||||||||||||||||
| current_context, | ||||||||||||||||||||||||||||
| logging_context, | ||||||||||||||||||||||||||||
| logging_context_params, | ||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| __all__ = [ | ||||||||||||||||||||||||||||
| "NiceFormatter", | ||||||||||||||||||||||||||||
| "install_logger", | ||||||||||||||||||||||||||||
| "current_context", | ||||||||||||||||||||||||||||
| "SkipLogging", | ||||||||||||||||||||||||||||
| "logging_context_params", | ||||||||||||||||||||||||||||
| "logging_context", | ||||||||||||||||||||||||||||
| ] | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| class NiceFormatter(logging.Formatter): | ||||||||||||||||||||||||||||
| """A nice formatter for logging. It uses colors and bold text if the console supports it.""" | ||||||||||||||||||||||||||||
|
|
@@ -36,7 +53,7 @@ def __init__(self, *, probe_tty: bool = False, stream: TextIO = sys.stdout) -> N | |||||||||||||||||||||||||||
| stream: the output stream to which the formatter will write, used to check if it is a console. | ||||||||||||||||||||||||||||
| probe_tty: If true, the formatter will enable color support if the output stream appears to be a console. | ||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||
| super().__init__(fmt="%(asctime)s %(levelname)s [%(name)s] %(message)s", datefmt="%H:%M:%S") | ||||||||||||||||||||||||||||
| super().__init__(fmt="%(asctime)s %(levelname)s [%(name)s] %(message)s%(context_msg)s", datefmt="%H:%M:%S") | ||||||||||||||||||||||||||||
| # Used to colorize the level names. | ||||||||||||||||||||||||||||
| self._levels = { | ||||||||||||||||||||||||||||
| logging.DEBUG: self._bold(f"{self.CYAN} DEBUG"), | ||||||||||||||||||||||||||||
|
|
@@ -88,7 +105,12 @@ def format(self, record: logging.LogRecord) -> str: | |||||||||||||||||||||||||||
| color_marker = self._msg_colors[record.levelno] | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| thread_name = f"[{record.threadName}]" if record.threadName != "MainThread" else "" | ||||||||||||||||||||||||||||
| return f"{self.GRAY}{timestamp}{self.RESET} {level} {color_marker}[{name}]{thread_name} {msg}{self.RESET}" | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| # safe check, just in case injection filter is removed | ||||||||||||||||||||||||||||
| context_repr = record.context if hasattr(record, "context") else "" | ||||||||||||||||||||||||||||
| context_msg = f" {self.GRAY}({context_repr}){self.RESET}" if context_repr else "" | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| return f"{self.GRAY}{timestamp}{self.RESET} {level} {color_marker}[{name}]{thread_name} {msg}{self.RESET}{context_msg}" | ||||||||||||||||||||||||||||
|
Comment on lines
+108
to
+113
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, how about?
Suggested change
|
||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| def install_logger( | ||||||||||||||||||||||||||||
|
|
@@ -102,6 +124,7 @@ def install_logger( | |||||||||||||||||||||||||||
| - All existing handlers will be removed. | ||||||||||||||||||||||||||||
| - A new handler will be installed with our custom formatter. It will be configured to emit logs at the given level | ||||||||||||||||||||||||||||
| (default: DEBUG) or higher, to the specified stream (default: sys.stderr). | ||||||||||||||||||||||||||||
| - A new (injection) filter for adding logger_context will be added, that will add `context` with current context, to all logger messages. | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| Args: | ||||||||||||||||||||||||||||
| level: The logging level to set for the console handler. | ||||||||||||||||||||||||||||
|
|
@@ -115,6 +138,8 @@ def install_logger( | |||||||||||||||||||||||||||
| root.removeHandler(handler) | ||||||||||||||||||||||||||||
| console_handler = logging.StreamHandler(stream) | ||||||||||||||||||||||||||||
| console_handler.setFormatter(NiceFormatter(stream=stream)) | ||||||||||||||||||||||||||||
| console_handler.addFilter(LoggingContextInjectingFilter()) | ||||||||||||||||||||||||||||
| console_handler.setLevel(level) | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| root.addHandler(console_handler) | ||||||||||||||||||||||||||||
| return console_handler | ||||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here the plot thickens… ideally we'd capture the context (as discussed earlier) when the task to run is being created, but by the time we reach here that's already happened. That said, we should be capturing the caller context during task submission and ensuring that when each task runs on the other thread, it starts within the captured context. (The task may modify it further, of-course.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.