Skip to content

feat: add JSON formatter config for Logstash / Elasticsearch ingestion#57

Open
tsu-shiuan wants to merge 3 commits intomasterfrom
feat/json-log-handler
Open

feat: add JSON formatter config for Logstash / Elasticsearch ingestion#57
tsu-shiuan wants to merge 3 commits intomasterfrom
feat/json-log-handler

Conversation

@tsu-shiuan
Copy link
Copy Markdown

@tsu-shiuan tsu-shiuan commented May 2, 2026

Summary

Adds Zexbox.Logging.JsonFormatter.config/1 (and the Zexbox.Logging.json_formatter_config/1 delegate). Returns a :logger formatter tuple wrapping LoggerJSON.Formatters.Basic with Zappi-standard defaults — splat into config :logger, :default_handler, formatter: ... in runtime.exs. Every log event becomes a single JSON line so multi-line content (struct inspections, multi-line SQL, stack traces) collapses into one Elasticsearch document at the ingest layer instead of fanning out into many.

This is the Phoenix / Elixir equivalent of opsbox's JsonFormatter for Ruby.

Why

Today filebeat ships every line of every container as its own ES document. Phoenix apps emit multi-line plain text (default Logger.bare_format), so a single struct inspection or SQL INSERT ... VALUES (...), (...), ... statement materialises as dozens of indexed docs. Production analysis showed live-api emitting ~2.9M docs/day from this fan-out alone, and data-stories ~580k/day from per-line %Phoenix.Socket{} struct dumps.

The Ruby pipeline doesn't have this problem because opsbox's JsonFormatter already emits one JSON line per event, and Logstash's if [kubernetes][container][name] == "rails" block JSON-parses each line into structured [log] fields. This PR brings that same pattern to Phoenix.

Usage

In config/runtime.exs of the consuming Phoenix app:

import Config

if config_env() == :prod do
  config :logger, :default_handler,
    formatter: Zexbox.Logging.json_formatter_config()
end

Output (one line per event):

{"time":"2026-05-02T01:23:45.678Z","severity":"info","message":"...","metadata":{...}}

To override metadata allow-list or redactors:

config :logger, :default_handler,
  formatter:
    Zexbox.Logging.json_formatter_config(
      metadata: [:request_id, :trace_id, :user_id],
      redactors: []
    )

Defaults include a curated metadata allow-list and a RedactKeys redactor stripping common credential keys (password, secret, token, authorization, api_key, session). Both overridable via opts; default_metadata/0 and default_redactors/0 are exposed for inspection or extension.

Why declarative

:logger is configured before Application.start/2 runs. Anything logged during boot — supervisor start, dependency init, early telemetry — uses the configured formatter. A runtime installer call would activate JSON only after boot logs had already gone out in plain text. Pure config makes JSON active from the first log line emitted by the BEAM, and keeps log-shape config visible in runtime.exs rather than buried in application code.

Companion changes (not in this PR)

End-to-end the pipeline needs two more changes, each independently safe:

  1. kubernetes-infra Logstash config — extend the existing kubernetes.container.name == "rails" block in 02_config_maps.yml.j2:103-155 to also match the phoenix and elixir container names. The existing JSON parser handles the shape this formatter emits.
  2. Per-app rollout — 3-line runtime.exs change per app. skip_on_invalid_json => true is already set in the Logstash json filter, so a Phoenix container still emitting plain text (pre-rollout) is a safe no-op.

Intended sequence: ship this gem → release → ship the kubernetes-infra change → roll Phoenix apps one at a time. Each phase is reversible.

Encoder choice

logger_json uses Jason as the JSON encoder by default. On Elixir 1.18+, consumers can opt into the stdlib JSON module via config :logger_json, encoder: JSON in config/config.exs (compile-time). Zexbox intentionally does not pin :jason directly so consumers can choose; Jason still arrives transitively via instream / ldclient / mix_audit / sobelow for zexbox's own dev/test.

Changes

  • lib/zexbox/logging/json_formatter.ex — new module returning the formatter tuple, with overridable default_metadata/0 and default_redactors/0.
  • lib/zexbox/logging.ex — exposes json_formatter_config/1 as a delegate, mirroring the attach_controller_logs!/0 pattern.
  • mix.exs — adds {:logger_json, "~> 7.0"}; bumps :elixir constraint from ~> 1.14 to ~> 1.15 to match logger_json 7.x. Of consuming Phoenix apps, only narrator is still on ~> 1.14 and can bump trivially; live-api, data-stories, brand_service, zappi-api are on 1.18+ and data-api on 1.15.
  • mix.lock — locks logger_json 7.0.4.
  • README.md — new ### JSON-formatted logs for Logstash / Elasticsearch section.
  • CHANGELOG.mdUnreleased entry; left version bump for whoever cuts the release.

Review history

Original shape was Zexbox.Logging.install_json_handler!/1, a runtime installer. Reworked per @brendon9x's review:

  • Dropped the runtime installer for a config-returning helper. Configuration is now visible in runtime.exs, and JSON is active from the first log line — no boot-log activation gap.
  • Dropped the bang. The new function returns a value and never raises.
  • Dropped the direct :jason pin. Encoder is the consumer's choice; on Elixir 1.18+ they can use the stdlib JSON module.

Risk

Low. New code path with no implicit activation — apps opt in via the runtime.exs config line. Elixir constraint bump is the only consumer-visible change; nothing existing in zexbox's API breaks.

Claude, but not flawed

Adds Zexbox.Logging.install_json_handler!/1 (and the underlying
Zexbox.Logging.JsonHandler) that swaps the default :logger handler's
formatter for a JSON one wrapping LoggerJSON.Formatters.Basic.

Mirrors the Ruby-side opsbox JsonFormatter so Phoenix logs land in
Elasticsearch as one structured document per event instead of fanning
multi-line content (Elixir struct inspections, multi-line SQL, stack
traces) out across many docs at the Filebeat-ingest layer.

Bumps the :elixir constraint from ~> 1.14 to ~> 1.15 to match
logger_json 7.x's requirement. Adds logger_json ~> 7.0 and
jason ~> 1.4 as dependencies.

Companion changes that need to land separately to deliver the
end-to-end pipeline:

  - kubernetes-infra: extend the Logstash filter that JSON-parses
    the rails container's message field to also match the phoenix
    and elixir container names. The existing block already handles
    the JSON shape this handler produces.

  - per-app: 3-line addition to runtime.exs, e.g.

      if config_env() == :prod do
        Zexbox.Logging.install_json_handler!()
      end

Tests: 6 new tests covering formatter swap, default options,
metadata allow-list, redactor passthrough, idempotency, and the
parent-module delegate. Full zexbox suite: 37 tests, 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Group Zexbox.Logging.JsonHandler + Zexbox.Logging.LogHandler into
the brace-expansion form so the file uses a single alias style for
that namespace.

[Consistency] Most of the time you are using the multi-alias/require/import/use
              syntax, but here you are using multiple single directives.

No behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@brendon9x brendon9x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top-level question: should this wrapper exist at all?

The biggest piece of feedback isn't about the implementation — it's about whether the install_json_handler!/1 API needs to exist in the first place.

The Elixir ecosystem norm is declarative config, not runtime installers. Phoenix, Ecto, Plug, Telemetry all document a config snippet to paste; none of them ship a Foo.install!/1 wrapper. logger_json itself tells users to write this in config/runtime.exs:

config :logger, :default_handler,
  formatter: {LoggerJSON.Formatters.Basic, %{}}

That is the entire setup. Concrete downsides of the runtime-call approach in this PR vs. pure config:

  1. Late activation. :logger is configured before the application starts. Anything logged during app boot — supervisor start, dependency init, early telemetry — goes out in the old text format until install_json_handler!() runs. Pure config makes JSON active from the very first log line.
  2. Hidden seam. A new engineer looking at config/runtime.exs to understand why logs look the way they do sees nothing JSON-related. They have to know to grep application code for install_json_handler.
  3. API surface for two lines of config. Both options (:metadata, :redactors) are passthroughs into a map that would otherwise live in config. The wrapper adds a maintenance burden without abstracting anything.

The legitimate counter-argument is "we want one place to set Zappi-wide defaults so apps don't drift." That's real, but it's better served by either:

  • A documented config snippet in the Zexbox README with the standardised defaults, or
  • Exposing Zexbox.Logging.json_formatter_config/1 that returns the {LoggerJSON.Formatters.Basic, %{...}} tuple to splat into config — keeping the configuration declarative while still centralising defaults.

Recommendation: drop install!/1 and the delegate. Replace with a README section showing the exact config :logger, :default_handler, ... line plus the standardised redactors/metadata list. If centralised defaults are wanted, expose them as a function that returns config, not one that mutates :logger at runtime.

The remaining two notes assume the wrapper survives this discussion.


The ! is wrong for the return type

lib/zexbox/logging/json_handler.ex declares:

@spec install!(keyword()) :: :ok | {:error, term()}

and the moduledoc explicitly documents the {:error, reason} return. In Elixir, ! means "raises on failure." A function that returns {:error, _} must not be named with a bang. Same applies to the defdelegate ... as: :install! in lib/zexbox/logging.ex.

Pick one:

  • Drop the bang: install/1 :: :ok | {:error, term()}, or
  • Keep the bang and actually raise on {:error, reason} (e.g. wrap the :logger.update_handler_config/3 call in a case that raises a RuntimeError with the reason).

Don't pin :jason as a direct dep

mix.exs adds {:jason, "~> 1.4"} alongside {:logger_json, "~> 7.0"}. Per the logger_json docs:

By default, LoggerJSON is using Jason as the JSON encoder. If you use Elixir 1.18 or later, you can use the built-in JSON module as the encoder. To do this, you need to set the :encoder option in your config.exs file. This setting is only available at compile-time:

config :logger_json, encoder: JSON

Pinning :jason in zexbox pushes the encoder choice down onto every consuming app, even those on Elixir 1.18+ that would prefer the stdlib JSON module. That's exactly the kind of constraint a base library shouldn't impose.

Recommendation: drop the :jason line from mix.exs. Let consumers pick their encoder via config :logger_json, encoder: ….

Copy link
Copy Markdown
Collaborator

@brendon9x brendon9x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per comment. I don't think this needs to exist other than docs.

Maybe an Igniter task but that feels like overkill.

@Hermanlangner
Copy link
Copy Markdown
Collaborator

As per comment. I don't think this needs to exist other than docs.

Maybe an Igniter task but that feels like overkill.

An igniter task sounds like a good starting point for having the utility of "Set up my application for observability". It should be easy enough to add with a ton of benefits

…r.config/1

Addresses Brendon's review on PR #57. Three changes, all in service of
keeping the logging shape declarative and idiomatic:

1. Drop the runtime installer in favour of a config-returning helper.
   `:logger` is configured before `Application.start/2` runs, so a
   `Foo.install!()` call in user code activates the JSON formatter
   *after* boot logs have already gone out in plain text. Returning the
   formatter tuple to splat into `config :logger, :default_handler,
   formatter: ...` makes JSON active from the first log line emitted by
   the BEAM and keeps log-shape config visible in `runtime.exs` rather
   than buried in application code.

2. Drop the bang on the public function. The previous
   `install!/1 :: :ok | {:error, term()}` violated Elixir naming
   convention — `!` means "raises on failure". `JsonFormatter.config/1`
   returns a value and never raises.

3. Stop pinning `:jason` directly. `logger_json` declares Jason as
   optional; pinning it in zexbox forces the encoder choice on consumer
   apps. On Elixir 1.18+ apps can now opt into the stdlib `JSON` module
   via `config :logger_json, encoder: JSON`. Jason still arrives
   transitively via `instream`, `ldclient`, `mix_audit`, and `sobelow`,
   so dev/test continues to work.

Defaults now live in `Zexbox.Logging.JsonFormatter`:
- `default_metadata/0` — curated allow-list (request_id, trace_id,
  span_id, user_id, pid, module, function, line, application).
- `default_redactors/0` — a single `RedactKeys` redactor stripping
  common credential keys (password, secret, token, authorization,
  api_key, session). Both overridable via `config/1` opts.

Tests are now `async: true` because the new function doesn't touch
global `:logger` state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tsu-shiuan tsu-shiuan changed the title feat: add JSON log handler for Logstash / Elasticsearch ingestion feat: add JSON formatter config for Logstash / Elasticsearch ingestion May 5, 2026
Copy link
Copy Markdown
Collaborator

@Hermanlangner Hermanlangner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments than issues, looking at this, doing an igniter follow up will be really sexy

Comment thread README.md
Comment on lines +161 to +170
In `config/runtime.exs`:

```elixir
import Config

if config_env() == :prod do
config :logger, :default_handler,
formatter: Zexbox.Logging.json_formatter_config()
end
```
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only want to put configurations affected by environment variables in config/runtime.exs, for something like logger you want to put it inside config/dev.exs or config/prod.exs.

The reason for this is that config/{dev|test|prod}.exs is set at compile time so for deploys it can't inject env variables, where in this case the formatter is static

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reread brendon's review to double check. They do mention you can configure it at runtime, but in our cases that's not necessary unless we want to feature flag it (which we dont), unless you want to get fancy for opening the console.... which almost makes me backtrack the thought.
I'd suggest taking a similar approach in the readme to logger_json specifying both options
https://hexdocs.pm/logger_json/readme.html

Configuration can be set using 2nd element of the tuple of the :formatter option in [Logger](https://hexdocs.pm/logger/Logger.html) configuration. For example in config.exs:

config :logger, :default_handler,
  formatter: LoggerJSON.Formatters.GoogleCloud.new(metadata: :all, project_id: "logger-101")

or during runtime:

formatter = LoggerJSON.Formatters.Basic.new(%{metadata: {:all_except, [:conn]}})
:logger.update_handler_config(:default, :formatter, formatter)

"""
@spec config(keyword()) :: {module(), map()}
def config(opts \\ []) do
Basic.new(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with being optinionated on this, but the json_logger spec is

new(opts)

with opts defined as type

  @type opts :: [
          {:encoder_opts, encoder_opts()}
          | {:metadata, :all | {:all_except, [atom()]} | [atom()]}
          | {:redactors, [{module(), term()}]}
          | {atom(), term()}
        ]

But we are explicitly passing in only metadata and redactors, meaning you can't use encoder_opts or their other atom opts possibilities.
Which I am okay with tbh, but if we wanted to be comprehensive.

A cheeky way to do this with minimal change would be to make use of Keyword.merge/2
the left being the default, but overriden by the right. In their docs

Keyword.merge([a: 1, b: 2], [a: 3, d: 4])
[b: 2, a: 3, d: 4]

Keyword.merge([a: 1, b: 2], [a: 3, d: 4, a: 5])
[b: 2, a: 3, d: 4, a: 5]

Which for us we can do

[metadata: @default_metadata, redactors: @default_redactors]
|> Keyword.merge(opts)
|> Basic.new()

To test I did this in my console

metadata = [
            :request_id,
            :trace_id,
            :span_id,
            :user_id,
            :pid,
            :module,
            :function,
            :line,
            :application
          ]
[:request_id, :trace_id, :span_id, :user_id, :pid, :module, :function, :line,
 :application]
iex(5)> [m: metadata, r: 1234] |> Keyword.merge(r: 9999, o: "A random key")
[
  m: [:request_id, :trace_id, :span_id, :user_id, :pid, :module, :function,
   :line, :application],
  r: 9999,
  o: "A random key"
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants