Skip to content

Conversation

@bobhansen
Copy link
Owner

Initial MVP of structured logging with JSON output and threadlocal context accumulation.

@bobhansen bobhansen changed the title Strucutred spdlog v0 Structured spdlog v0 Jan 20, 2022
Copy link

@bhumbers bhumbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Edit: Er, sorry... still reviewing. Thought I was just approving the first commit in the chain. It's been too long since I used GitHub...

Copy link

@bhumbers bhumbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; no required changes, just comments/questions

Comment on lines +16 to +27
#ifndef SPDLOG_NO_STRUCTURED_SPDLOG
field_buffer = std::vector<Field>(field_data, field_data + field_data_count);
field_data = field_buffer.data();

// Copy strings from fields
for (size_t i=0; i < field_data_count; i++) {
buffer.append(field_data[i].name);
if (field_data[i].value_type == FieldValueType::STRING_VIEW) {
buffer.append(field_data[i].string_view_.begin(), field_data[i].string_view_.end());
}
}
#endif

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC, is there a particular reason to pack string view field data into buffer in front of logger_name and payload vs after, or just an arbitrary choice?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong reason; it would be good to get them in the order that they're going to be consumed to maybe get a little bit of cache advantage, but I don't have any numbers to back that up.

In a previous rev; we were packing the fields into the buffer (since they're trivially copyable), and we were more likely to have proper alignment. As it turns out, that was even harder than I thought and I rolled it back.

Comment on lines +43 to +45
case FieldValueType::CHAR: dest.push_back(field.char_); break;
case FieldValueType::UCHAR: details::fmt_helper::append_int(field.uchar_, dest); break;
case FieldValueType::WCHAR: details::fmt_helper::append_string_view(std::to_string(field.wchar_), dest); break;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this means, unlike char and wchar, uchar's will be converted to a decimal value when appended to a string. Is that the expected behavior? I think "yes", but want to double-check. The alternative would be to treat as a char and append the character value.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition here is that anybody using an unsigned is using it numerically rather than char-ly. That matches my experience, but I'm open to rebuttal.

Comment on lines 28 to 177
SPDLOG_CONSTEXPR char hex_digits[] = "0123456789abcdef";

// 5 -> escape to \uXXXXX
// 1 -> escape to \n, \r or the like
SPDLOG_CONSTEXPR uint8_t extra_chars_lookup[256] = {
5, 5, 5, 5, 5, 5, 5, 5, 1, 1, 1, 5, 1, 1, 5, 5, // 0x0x
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, // 0x1x
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x2x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x3x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x4x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, // 0x5x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x6x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x7x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x8x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x9x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0xAx
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0xBx
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0xCx
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0xDx
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0xEx
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0xFx
};

// TODO: make a version that can take a list of [start,end] ranges; we can then
// write the whole line to the dest buffer and do one pass of escaping: count,
// realloc, escape

SPDLOG_INLINE void escape_to_end(spdlog::memory_buf_t &dest, size_t start_offset)
{
// Check to see if we have any characters that must be escaped
// See https://datatracker.ietf.org/doc/html/rfc8259#section-7
// Note that only certain ASCII characters need to be; all non-ASCII
// utf-8 codepoints can pass right through

// TODO: Note that this code is not correct for UTF-8 inputs; if we have
// a UTF-8 value like \u03a3 (Sigma), the UTF-8 encoding is \xCE\xA3, and
// it will be encoded as \u00ce\u00a3 rather than \u03a3.
//
// We need to actually parse the input as utf-8 codepoints to
// know which bytes are consumed in multi-byte utf-8 encodings
// See https://datatracker.ietf.org/doc/html/rfc3629

// TODO: widechar support
static_assert(sizeof(dest[0]) ==1, "Wide chars are not supported by escape_to_end yet");

size_t extra_chars_required = 0;
for (auto i=start_offset; i < dest.size(); i++) {
uint8_t c = dest[i]; // need to make it unsigned
extra_chars_required += extra_chars_lookup[c];
}
if (extra_chars_required == 0) {
return; // No escaping to be done
}
size_t original_size = dest.size();
dest.resize(dest.size() + extra_chars_required);

// Work backward until done
auto start_p = dest.data() + start_offset;
auto src_p = dest.data() + original_size - 1;
auto dest_p = src_p + extra_chars_required;
while (src_p >= start_p) {
uint8_t c = *src_p;
switch(extra_chars_lookup[c]) {
case 5:
dest_p -= 5;
dest_p[0] = '\\';
dest_p[1] = 'u';
dest_p[2] = '0';
dest_p[3] = '0';
dest_p[4] = hex_digits[(c >> 4) & 0x0f];
dest_p[5] = hex_digits[c & 0x0f];
break;
case 1:
dest_p -= 1;
dest_p[0] = '\\';
switch(c) {
case '"':
dest_p[1] = '"';
break;
case '\\':
dest_p[1] = '\\';
break;
case '\b':
dest_p[1] = 'b';
break;
case '\f':
dest_p[1] = 'f';
break;
case '\n':
dest_p[1] = 'n';
break;
case '\r':
dest_p[1] = 'r';
break;
case '\t':
dest_p[1] = 't';
break;
default:
abort(); // should never get here
} // switch(c)
break;
case 0:
*dest_p = c;
break;
default:
abort(); // should never get here
} // switch(extra_chars_lookup[c])

dest_p--;
src_p--;
}

// They should have converged at the start
assert(dest_p == src_p);
}

SPDLOG_INLINE bool pattern_needs_escaping(string_view_t pattern)
{
// Look for any % pattern that isn't in our whitelist of known-doesn't-need-escaping patterns
// Anything that can only emit printable ASCII is OK.

constexpr char KNOWN_CLEAN_PATTERNS[] = "LtplLaAbBcCYDxmdHIMSefFprRTXzE%#oiuO";

for (size_t i=0; i < pattern.size(); i++) {
// Check for % flags
uint8_t c = static_cast<uint8_t>(pattern[i]);
if (c == '%' && i < pattern.size() - 1) {
i++;
c = static_cast<uint8_t>(pattern[i]);

// TODO: make this a binary search
bool flag_char_is_clean = false;
for (auto clean: KNOWN_CLEAN_PATTERNS) {
if (clean == c) {
flag_char_is_clean = true;
break;
}
}
if (!flag_char_is_clean) {
return true;
}
}

// Check for json-escaped character in pattern
if (extra_chars_lookup[c] != 0) {
return true;
}
}
return false;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: This section is pretty dense reading that I didn't 100% follow; I'm trusting deeply in the unit tests for this code to catch any errors

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh. Yeah. There's some amount of cleverness there, and Hansen's Second Law states "clever is the opposite of maintainable." :-)

I'll add some extra docs to give whoever follow a few more guideposts.

Copy link
Owner Author

@bobhansen bobhansen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a few action items for a follow-up diff:

  • pattern_needs_escaping and escape_to_end: more docs
  • iterator_catecory in structured_spdlog.h
  • test_json_formatter.cpp: magic number 39
  • CMakeLists extraneous github

Comment on lines +135 to +144
spdlog::context ctx0({{"ctx", 1}});
spdlog::context ctx1({{"ctx", 1}});
spdlog::context ctx2({{"ctx", 1}});
spdlog::context ctx3({{"ctx", 1}});
spdlog::context ctx4({{"ctx", 1}});
spdlog::context ctx5({{"ctx", 1}});
spdlog::context ctx6({{"ctx", 1}});
spdlog::context ctx7({{"ctx", 1}});
spdlog::context ctx8({{"ctx", 1}});
spdlog::context ctx9({{"ctx", 1}});
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation, no, it's all the same. There's a computationally-not-free feature where contexts can overwrite parent values, but for now they only accumulate.

It would be better to make them unambiguous, though.

Comment on lines +16 to +27
#ifndef SPDLOG_NO_STRUCTURED_SPDLOG
field_buffer = std::vector<Field>(field_data, field_data + field_data_count);
field_data = field_buffer.data();

// Copy strings from fields
for (size_t i=0; i < field_data_count; i++) {
buffer.append(field_data[i].name);
if (field_data[i].value_type == FieldValueType::STRING_VIEW) {
buffer.append(field_data[i].string_view_.begin(), field_data[i].string_view_.end());
}
}
#endif
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong reason; it would be good to get them in the order that they're going to be consumed to maybe get a little bit of cache advantage, but I don't have any numbers to back that up.

In a previous rev; we were packing the fields into the buffer (since they're trivially copyable), and we were more likely to have proper alignment. As it turns out, that was even harder than I thought and I rolled it back.

Comment on lines 61 to +63
buffer.clear();
buffer.append(other.buffer.data(), other.buffer.data() + other.buffer.size());
field_buffer = other.field_buffer;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit of a speed hack in the spdlog code. memory_buffer.clear() doesn't free any backing memory, so it's faster to just copy the data over.

field_buffer is a vector that doesn't have any backing memory by default, so we don't get the same gains with it. This code could be optimized a bit, but I want to see profiler results before going down that road.

Comment on lines 28 to 177
SPDLOG_CONSTEXPR char hex_digits[] = "0123456789abcdef";

// 5 -> escape to \uXXXXX
// 1 -> escape to \n, \r or the like
SPDLOG_CONSTEXPR uint8_t extra_chars_lookup[256] = {
5, 5, 5, 5, 5, 5, 5, 5, 1, 1, 1, 5, 1, 1, 5, 5, // 0x0x
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, // 0x1x
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x2x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x3x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x4x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, // 0x5x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x6x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x7x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x8x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x9x
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0xAx
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0xBx
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0xCx
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0xDx
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0xEx
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0xFx
};

// TODO: make a version that can take a list of [start,end] ranges; we can then
// write the whole line to the dest buffer and do one pass of escaping: count,
// realloc, escape

SPDLOG_INLINE void escape_to_end(spdlog::memory_buf_t &dest, size_t start_offset)
{
// Check to see if we have any characters that must be escaped
// See https://datatracker.ietf.org/doc/html/rfc8259#section-7
// Note that only certain ASCII characters need to be; all non-ASCII
// utf-8 codepoints can pass right through

// TODO: Note that this code is not correct for UTF-8 inputs; if we have
// a UTF-8 value like \u03a3 (Sigma), the UTF-8 encoding is \xCE\xA3, and
// it will be encoded as \u00ce\u00a3 rather than \u03a3.
//
// We need to actually parse the input as utf-8 codepoints to
// know which bytes are consumed in multi-byte utf-8 encodings
// See https://datatracker.ietf.org/doc/html/rfc3629

// TODO: widechar support
static_assert(sizeof(dest[0]) ==1, "Wide chars are not supported by escape_to_end yet");

size_t extra_chars_required = 0;
for (auto i=start_offset; i < dest.size(); i++) {
uint8_t c = dest[i]; // need to make it unsigned
extra_chars_required += extra_chars_lookup[c];
}
if (extra_chars_required == 0) {
return; // No escaping to be done
}
size_t original_size = dest.size();
dest.resize(dest.size() + extra_chars_required);

// Work backward until done
auto start_p = dest.data() + start_offset;
auto src_p = dest.data() + original_size - 1;
auto dest_p = src_p + extra_chars_required;
while (src_p >= start_p) {
uint8_t c = *src_p;
switch(extra_chars_lookup[c]) {
case 5:
dest_p -= 5;
dest_p[0] = '\\';
dest_p[1] = 'u';
dest_p[2] = '0';
dest_p[3] = '0';
dest_p[4] = hex_digits[(c >> 4) & 0x0f];
dest_p[5] = hex_digits[c & 0x0f];
break;
case 1:
dest_p -= 1;
dest_p[0] = '\\';
switch(c) {
case '"':
dest_p[1] = '"';
break;
case '\\':
dest_p[1] = '\\';
break;
case '\b':
dest_p[1] = 'b';
break;
case '\f':
dest_p[1] = 'f';
break;
case '\n':
dest_p[1] = 'n';
break;
case '\r':
dest_p[1] = 'r';
break;
case '\t':
dest_p[1] = 't';
break;
default:
abort(); // should never get here
} // switch(c)
break;
case 0:
*dest_p = c;
break;
default:
abort(); // should never get here
} // switch(extra_chars_lookup[c])

dest_p--;
src_p--;
}

// They should have converged at the start
assert(dest_p == src_p);
}

SPDLOG_INLINE bool pattern_needs_escaping(string_view_t pattern)
{
// Look for any % pattern that isn't in our whitelist of known-doesn't-need-escaping patterns
// Anything that can only emit printable ASCII is OK.

constexpr char KNOWN_CLEAN_PATTERNS[] = "LtplLaAbBcCYDxmdHIMSefFprRTXzE%#oiuO";

for (size_t i=0; i < pattern.size(); i++) {
// Check for % flags
uint8_t c = static_cast<uint8_t>(pattern[i]);
if (c == '%' && i < pattern.size() - 1) {
i++;
c = static_cast<uint8_t>(pattern[i]);

// TODO: make this a binary search
bool flag_char_is_clean = false;
for (auto clean: KNOWN_CLEAN_PATTERNS) {
if (clean == c) {
flag_char_is_clean = true;
break;
}
}
if (!flag_char_is_clean) {
return true;
}
}

// Check for json-escaped character in pattern
if (extra_chars_lookup[c] != 0) {
return true;
}
}
return false;
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh. Yeah. There's some amount of cleverness there, and Hansen's Second Law states "clever is the opposite of maintainable." :-)

I'll add some extra docs to give whoever follow a few more guideposts.

Comment on lines +43 to +45
case FieldValueType::CHAR: dest.push_back(field.char_); break;
case FieldValueType::UCHAR: details::fmt_helper::append_int(field.uchar_, dest); break;
case FieldValueType::WCHAR: details::fmt_helper::append_string_view(std::to_string(field.wchar_), dest); break;
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition here is that anybody using an unsigned is using it numerically rather than char-ly. That matches my experience, but I'm open to rebuttal.

REQUIRE(to_string(buffer) == "\\\\\\r\\n\\t\\b\\f\\\"");

// Not escaped
buffer.resize(256 - 39);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll call it out to a constant.

// Can we construct fields with rvalues?
const char * cstr = "str";
auto cstr_f = spdlog::F(cstr, cstr);
REQUIRE(cstr_f.value_type == spdlog::FieldValueType::INT);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add additional comments. There was a lot of fighting with the compiler. It's super-useful to have literals go to the string_view constructor, with a compile-time size calculation. That's implemented with the const char (&val)[N] constructor.

If I include a const char * ctor, string literals will bind closer to that, and we'll have to calculate the string size at runtime. Since I expect the vast majority of strings to be passed in as either std::string or string literals, I thought this was a good starting spot.

I'd like to eventually ensure that both compile-time sizes for string literals, run-time O(1) sizes for std::strings, and dynamic sizes for const char * are all supported.

logger->flush();
REQUIRE(test_sink->msg_counter() < messages);
REQUIRE(test_sink->msg_counter() > 0);
REQUIRE(tp->overrun_counter() > 0);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logger->flush() ensures that the sink has seen all queued messages.

}
auto end = curr + 1; // end is exclusive

while (curr != start && *(curr-1) != '\n' && *(curr-1) != '\r' && curr != start) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to really, really make sure that we weren't at start. :-)

require_message_count(TEST_FILENAME, 2);
REQUIRE(last_line(file_contents(TEST_FILENAME)) == "Test message 4 f:1");

spdlog::set_default_logger(std::move(orig_default_logger));
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be possible with a scope guard idiom, but I would favor clean code over cascading failure cases.

logger &operator=(logger other) SPDLOG_NOEXCEPT;
void swap(spdlog::logger &other) SPDLOG_NOEXCEPT;

void log(source_loc loc, level::level_enum lvl, std::initializer_list<Field> fields, string_view_t msg)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am correct in that this will only support a flat json object? No support for nested json objects or json arrays?

Copy link
Owner Author

@bobhansen bobhansen Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct. It's high in the list of TODOs in the README.

I toyed around with different interfaces. I've considered writing some glue for objects that are serializable to https://github.com/nlohmann/json (or other popular libraries), but I hate to add dependencies for spdlog (I'm also not sure how performant the interfaces are).

Data is currently stuffed into a big ol' union of fundamental C++ types; if we supported lists and objects, there would have to be additional allocations for data (at least for multithreaded loggers).

@cfxegbert - what kind of interfaces would you like to see?

Copy link

@cfxegbert cfxegbert Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently using https://github.com/takeshibaconsuzuki/structlog in my current project which uses nlohmann::json. I like your approach of having it as an argument to the log call instead of returning a context object I modify.

I have a visitor on a std::variant that returns a json object.

    m_logger->debug("command response")({
        {"command", enumToString(Command(response.command))},
        {"errorCode", enumToString(response.errorCode)},
        {"result", std::visit(VisitResult(), response.result)}
    });
struct VisitResult {
    auto operator()(const bool val) const noexcept {
        return nlohmann::json(val);
    }
    auto operator()(const std::uint16_t val) const noexcept {
        return nlohmann::json(toHex(val));
    }

    ...

    auto operator()(const Murideo::Timing &timing) const noexcept {
        return nlohmann::json({{"Timing", enumToString(timing)}});
    }
    auto operator()(const Murideo::RGBPatternStatus &pattern) const noexcept {
        return nlohmann::json({
            {"color", {
                          float(pattern.red)/255.0F,
                          float(pattern.green)/255.0F,
                          float(pattern.blue)/255.0F
                      }},
            {"background", {
                               float(pattern.redBackground)/255.0F,
                               float(pattern.greenBackground)/255.0F,
                               float(pattern.blueBackground)/255.0F
                           }},
            {"coverage", pattern.percent},
            {"hdr", pattern.hdr}
        });
    }

    ...

}

As you can see I am returning just a json value or a nested object from my visitor. The fields color and background on the pattern status are lists of floats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants