-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON-SEQ is painful #434
Comments
I've been using and parsing QLOG quite extensively for my research and I can see how it can become painful especially for debugging purposes (sometimes you just have to browse through a file to find the exact problem that triggers some weird behaviour that appears during one run out of 1000). Not sure how my use-case is representative of real-world scenarios though, but I had to do that a lot. It is probably close to what implementers like @marten-seemann sometimes have to do. If it actually really messes up with existing tools that worked fined with the previous format (I think it was close to, if not exactly jsonlines, which seems quite popular ?), I guess it makes sense to at least discuss that ? That being said, developing a simple converter from json-seq to jsonlines for debugging use-cases that require grep, vim & co may be a good middleground solution, without necessarily having to change the spec. |
Personally I've had no issues with the JSON-SEQ format. The RFC 7464 is clear on various edge cases that might arise when parsing truncated logs. There is no equivalent IETF standard that exists for just a single line. The default serialization format for the QLogFile is JSON, which also allows "pretty-printing" styles to include newlines too, so I don't see the grep argument as too strong (although I do understand it). The majority of time I just print each event on a single line. But I do have use cases where qlogs are meant to be eyeballed by a human, and having pretty printing does actually help those. VS code deals with the record separator character just fine: There are users of quiche's standalone qlog crate, and users of quiche for client and server. I have heard them having no issue with JSON-SEQ over a single line. We also provide a very simple reader class that seems to work ok, albeit it could probably be hardened by following more advice from RFC 7464. |
A middleground might be to subset JSON-SEQ and state something along the lines of "In addition to the formatting rules defined in JSON-SEQ, it is RECOMMENDED that loggers do not embed new lines inside of serialized events, this makes it easier for general-purpose, line-oriented tools, to consume them." @marten-seemann might argue for a |
Maybe an interesting point is that xquic logs to slog/clog and then converts to json via python: https://github.com/alibaba/xquic/blob/main/scripts/qlog_parser.py And tangentially there's an issue on msquic for translation from their preferred format to qlog - microsoft/msquic#1158 |
@LPardue makes a good point. What QLog defines is an interchange format between tools, not the format that your tools are supposed to log natively. People have the freedom to log in the format they like, then convert that to QLog. FWIW, quicly has taken the same approach. That said, I think I agree with @marten-seemann and @francoismichel regarding if there is a reason to endorse a serialization format that has edges compared to JSONL. Knowing that JSONL is not standardized, one option we have is stop short of declaring how we concatenate the events. The idea is that while it is paramount to have the definition of how each event is being serialized, it is not that important to agree on one way of concatenating the events. IMO, it can either be JSONL, JSON-SEQ, or one JSON array. If most people prefer JSONL, we will end up consolidating on JSONL. Does that sound like a better strategy compared to naming JSON-SEQ in a normative manner? |
Speaking personally with no hats, I don't think we should spend QUIC WG time relitigating all of the finer details of JSON encoding and parsing, for some newline delineated format. Leaving it wooly defined, doesn't seem acceptable to me. The spec is targetted towards tool makers too, selling them short risks interop failures of the structured logging nature of qlog. That doesn't seem like a fair tradeoff when tools working on unstructured lines data can work fine if each event is contained on a line. |
As a very boring example - what media type would a handrolled ndjson use? Right now we have proposed to piggy back off of the existing |
I might be blissfully ignorant regarding the edge cases of JSON serialization, but in my mind, NLJSON is trivially simple to define:
That's it.
I'm not convinced that adding a MIME type to the file itself was the right thing to do, see #435. If we remove this field, we might not need to specify anything. As a general point, what we should optimize for is (new) developers / researchers handling qlogs. It should be dead simple to take a bunch of qlogs from different implementations, and use them to debug a QUIC transfer. Of course the file format doesn't matter if you're only using tools that understand the intricacies of qlog, like qvis. But analysis often involves using standard unix tools. Simple example: count how many times a particular event happened: I do realize that this might make it harder to push the spec through the IETF process. This is unfortunate, and certainly annoying for us as editors, and the QUIC WG chairs. But in the end, we're not writing a spec so that it fits neatly into the set of existing RFCs. We're writing a spec, so we have a neat serialization format that's maximally useful for devs and researchers. |
client-1928a4431b958983545fe79a00fcf7a670f39d8e.zip unzips to client-1928a4431b958983545fe79a00fcf7a670f39d8e.sqlog, which is a JSON-SEQ file that includes some pretty printing e.g.
It works:
or
or even
that last one is even has a fragment you can use to seemlessly convert to line-delimited for you own needs, e.g. here's the output of
Honestly, for anyone wrangling JSON, |
It works because you picked an example that's so simple that JSON-SEQ doesn't break it. Unfortunately, that doesn't apply to the vast majority of things you might want to do. For example, try counting the number of packets sent that counted a MAX_STREAM_DATA frame. You'd need something like Sure, you can use |
wrt to vim, this is expected behaviour as described in https://vimdoc.sourceforge.net/htmldoc/options.html#%27ruler%27
Anyone that likes to program with TAB would already be used to this |
Adding |
As an individual without hats:I fully agree with @marten-seemann here... I think standards should reflect how people actually use stuff and the JSON-SEQ thing is plain annoying, even if things like jq have support for it (and we can "work around" it by having transformers and whatnot. My time at a larger company has taught me the absolute value of "sensible defaults" and "predictability through familiarity"). It really annoyed me having to switch from NDJSON to JSON-SEQ when we did just because NDJSON doesn't have an (IETF) RFC (even if I understand the arguments). As an editor with too many years on this already:I have zero (0) appetite to try and switch this again at this stage and especially trying to push it through the (eventual) IESG/whatever process needed to finally get these out the door. With the recent extensibility changes, we can easily write an extension draft that defines how to still use NDJSON, it just won't be part of the default/first set of documents (Oh RFC 9218, have we learned nothing). I'm also perfectly fine with keeping support for NDJSON in qvis for example. |
Discussed during editors meeting. Will need face 2 face discussion with @marten-seemann during upcoming meeting to resolve this. |
Discussed during editors meeting with me, Marten and Lucas. We can't agree among ourselves whether to switch to NDJSON, add NDJSON or just stick with JSON-SEQ. However, we also feel this is not the most important issue to solve right now, so we will mention it during the upcoming IETF meeting in Dublin to see if people want to come to this issue to advocate for one option or another, but we won't call for a vote/hum on this from the WG at this point. |
This needs to be raised on the QUIC mailing list before then, to reach that the broadest set of people |
When we switched to JSON-SEQ a while back (#172), we assumed that using record separators instead of newline-delimited JSON wouldn't cause too much pain. I think it's time to reevaluate this assumption.
vim
It also turns out that existing software is not that great at handling record record separators. Here's a screenshot of
vim
messing up the line character count (note that last line, where it usually tells you the cursor position, and it now displays 12-13).jq
jq
is actually the tool that's the most compatible with JSONSEQ. However, one MUST use the--seq
flag, using standardjq
results in the following error, which is not very helpful for the user:I had to use a search engine to figure out how to convince
jq
to use record separators. Definitely not the end of the world, but a subpar developer experience nonetheless.grep
Another example is
grep
. Try to extract thetransport:connection_started
event from this qlog file usinggrep
:First of all, note how GitHub renders the record separator here: � (I love seeing this character on my screen!).
But back to
grep
: I haven't figured out a way to grep for thetransport:connection_started
event, and apparently it's not easy to change the record separator.grep
ing is very important when debugging, I often find myself filtering a qlog file for the event type(s) that I'm interested in.The text was updated successfully, but these errors were encountered: