Skip to content

Adjust GcpLayout JSON to latest format #3586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: 2.x
Choose a base branch
from

Conversation

ViliusS
Copy link

@ViliusS ViliusS commented Apr 2, 2025

This is my first contribution to Log4J so I'm not 100% sure if I did everything right, but here it goes.

This PR adjusts GcpLayout JSON formatting to match what is expected by GCP.

First, it formats the log timestamp field to the correct format recognized by Fluent-Bit (component of Google Cloud Logging) and Google Ops Agent. I'm not sure how it worked before, probably it didn't, but Google Cloud Logging expects timestampSecond + timestampNanos format. Ops Agent, which also can be used to feed the logs from external Log4J environment via Socket or RollingFile into GCP, also expects the same format. More over, the same timestamp format is always used in Spring GCP libraries.

Secondly, severity field now must be prefixed with logging.googleapis.com. I'm not 100% sure regarding this, because Spring GCP libraries are still using just severity field name. However Ops Agent (link above) already uses logging.googleapis.com/severity, and simple severity name is only described in legacy logging agent which is deprecated.

Third, counter cannot be used for logging.googleapis.com/insertId as it is duplicated on different threads. I've chosen to use time-based UUID, but if that increases garbage collection issues it safely can be removed. GCP will generate their own IDs anyway.

And the last but not the least, I have renamed exception, thread and logger fields. I saw the PR discussion in #543, however these fields are pretty standard when logging via Logback's JSON layout and then they are reused in Google's Spring GCP libraries. Sadly, I didn't find how to completely remove exception field from JSON output if it's empty. Looks like all empty fields are removed except for pattern type. I also found https://issues.apache.org/jira/browse/LOG4J2-3053 regarding that, but not sure about its state. This is not a big issue but is preferable for GCP.
By the way, Log4J docs are a little bit incorrect. If message field exist, having separate exception does nothing to catch the error in Google Error Reporting. According to Google docs, if message field exist it must contain stack trace and only that stack trace is used in the Error Reporting UI. I tested this extensively. But I left separate exception field if one wants to filter it in the logs.

All in all, this should provide much better JSON field mapping experience on GCP. If you think this PR is usable I then will go ahead and fix tests for timestamp processing.

Copy link
Contributor

@ppkarwasz ppkarwasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ViliusS,

Thanks for you contribution.

Did you test it on an actual Google Cloud instance? I am not a GCP user, but some of the changes you propose differ from the documented log format.

Copy link

github-actions bot commented Apr 2, 2025

Job Requested goals Build Tool Version Build Outcome Build Scan®
build-macos-latest clean install 3.9.8 Build Scan PUBLISHED
build-ubuntu-latest clean install 3.9.8 Build Scan PUBLISHED
build-windows-latest clean install 3.9.8 Build Scan PUBLISHED
Generated by gradle/develocity-actions

@ViliusS
Copy link
Author

ViliusS commented Apr 2, 2025

@ViliusS,

Thanks for you contribution.

Did you test it on an actual Google Cloud instance? I am not a GCP user, but some of the changes you propose differ from the documented log format.

Yes I did. Here are some screenshots.

Exception caught because of the stack trace in message field:
image
Expanded jsonPayload fields in the logging:
image
Proof about automatic timestamp mapping (previously this field was generated by Google and the one sent from Log4J lived in jsonPayload.timestamp:
image

The same error caught in Error Reporting UI:
image

@ViliusS ViliusS requested a review from ppkarwasz April 2, 2025 14:48
@ppkarwasz ppkarwasz requested review from vy and removed request for ppkarwasz April 13, 2025 06:59
@ppkarwasz
Copy link
Contributor

Let's wait for @vy's review.

@ViliusS, in this repository we require signed commits, so you need to:

  1. Add your GPG public key to your GitHub Settings if you didn't do that already.
  2. Rebase the PR to resign all the commits.

@vy vy added layouts Affects one or more Layout plugins bug Incorrect, unexpected, or unintended behavior of existing code labels Apr 17, 2025
ViliusS added 7 commits April 21, 2025 12:32
First, it formats the log timestamp field to the correct format
recognized by Fluent-Bit (component of Google Cloud Logging) and Google
Ops Agent.

Secondly, severity field now must be prefixed with
logging.googleapis.com.

Third, counter cannot be used for insertId as it is duplicated on
different threads.

And the last but not the least, exception, thread and logger fields are
pretty standard when logging via Logback's JSON layout and Google's
Spring GCP libraries. Field name changes now match these other loggers.
@ViliusS ViliusS force-pushed the gcplayout-improvements branch from 718ccbd to 5e0656e Compare April 21, 2025 09:34
@ViliusS
Copy link
Author

ViliusS commented Apr 21, 2025

Let's wait for @vy's review.

@ViliusS, in this repository we require signed commits, so you need to:

  1. Add your GPG public key to your GitHub Settings if you didn't do that already.
  2. Rebase the PR to resign all the commits.

Done.

@vy
Copy link
Member

vy commented Apr 23, 2025

One particular thing that is bothering me about this fix is... it breaks backward compatibility. We can say we're fixing the schema, but for users who have already been using the broken schema, we are gonna create trouble. We can

  • Cross our fingers, release the fixed schema, and think of a solution if we ever happen to receive a complaint
  • Release the new schema under a different name (e.g., GcpLayout2.json)

@ppkarwasz, thoughts?

@ViliusS, would you mind adding a changelog entry file (i.e., src/changelog/.2.x.x/3586_fix_GcpLayout.xml), please? (Mind the entry.type attribute and entry > issue element.)

@ppkarwasz
Copy link
Contributor

One particular thing that is bothering me about this fix is... it breaks backward compatibility. We can say we're fixing the schema, but for users who have already been using the broken schema, we are gonna create trouble. We can

  • Cross our fingers, release the fixed schema, and think of a solution if we ever happen to receive a complaint
  • Release the new schema under a different name (e.g., GcpLayout2.json)

I would call the new format GoogleOpsAgent.json, FluentBit.json or something similar.

However, I am not entirely sure that this template file belongs to Log4j. The only users that can profit from it are Google Ops Agent users, so we might consider moving the file to that project instead. See also GoogleCloudPlatform/ops-agent#1920 (comment)

@ViliusS
Copy link
Author

ViliusS commented Apr 23, 2025

One particular thing that is bothering me about this fix is... it breaks backward compatibility. We can say we're fixing the schema, but for users who have already been using the broken schema, we are gonna create trouble. We can

  • Cross our fingers, release the fixed schema, and think of a solution if we ever happen to receive a complaint
  • Release the new schema under a different name (e.g., GcpLayout2.json)

@ppkarwasz, thoughts?

@ViliusS, would you mind adding a changelog entry file (i.e., src/changelog/.2.x.x/3586_fix_GcpLayout.xml), please? (Mind the entry.type attribute and entry > issue element.)

Changelog added.

Regarding backward compatibility I see it this way. The proposed changes are not really fixes, they are changes/improvements. Old schema probably worked in some sense, but required a little bit tinkering for proper GCP field mapping on Google side. It is natural that schemas change and they need to be re-adjusted, there is no way to avoid that when dealing with 3rd party services. The real question is when backward compatibility and how it should be broken.
You are right that some users could be using old schema already, so let's try to list what's actually changed and the use cases.

  1. The timestamp field users who just send the data from previous field directly to GCP not realizing that it is not mapped to real log timestamp. Nothing serious changes for them, new timestamp fields will be sent to GCP and automatically mapped to log timestamp. They will be missing previous jsonPayload.timestamp field on GCP side, but that should not break anything serious.
  2. Timestamp field users who processed current field values with Ops Agent or any other tool to be able to map it to GCP timestamp. Such tools usually just ignore missing field, but again information is now sent to GCP's timestamp field automatically, so nothing is lost.
  3. Removed insertId field. Nothing should depend on this field. It is for internal GCP usage and will now be generated automatically.
  4. Renamed/Changed exception format and removed separate class and message subfields. Some could depend on these fields in Google Cloud Monitoring but would consider it as an edge case, since the exception was/is duplicated in main message field anyway and was/is caught by the Google Error Reporting automatically. There was/is not need to configure separate monitoring.
  5. Renamed thread and logger fields. Some could depend on these fields in Google Cloud Monitoring but, again, I would consider it as an edge case.

Considering that new adjustments makes it a lot easier to send logs to GCP now without major processing in between, and if they are included in 2.25.0, I think it is fair.

@ppkarwasz I wouldn't call it OpsAgent. This format could be used to print directly to stdout which is then automatically sent to GCP on Google's Kubernetes Engine or serverless AppEngine services. I would not differentiate it from ECS or Gelf you already support.
I also don't think it belongs to Ops Agent repo. If, for example, I want to use two monitoring systems and both require different formats moving JSON template elsewhere means that I must use Ops Agent for one monitoring solution, and then use different agent for another solution. Having JSON templates in central repo makes things must easier and more transparent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect, unexpected, or unintended behavior of existing code layouts Affects one or more Layout plugins
Projects
Status: To triage
Development

Successfully merging this pull request may close these issues.

3 participants