Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting of Base64 encoded binaries, i.e. long lines with no spaces, separated by newlines is not ideal #585

Open
rjmunro opened this issue Oct 17, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@rjmunro
Copy link

rjmunro commented Oct 17, 2024

Describe the bug

I had some Base64 data which I formatted onto lines of length 76 separated by newlines. I embedded it into some data, stringified with YAML and the results looked like:

foo:
  bar:
    baz:
      base64: >-
        dGVzdGluZyB0ZXN0aW5nIHRlc3RpbmcgdGVzdGluZyB0ZXN0aW5nIHRlc3RpbmcgdGVzdGluZyB0

        ZXN0aW5nIHRlc3RpbmcgdGVzdGluZyB0ZXN0aW5nIHRlc3RpbmcgdGVzdGluZyB0ZXN0aW5nIHRl

        c3RpbmcgdGVzdGluZyB0ZXN0aW5nIHRlc3RpbmcgdGVzdGluZyB0ZXN0aW5nIA==

It would make more sense to use the | instead of > and then it wouldn't need to put a blank line between each line of data. I can see that it hasn't done that because it sees the lines are long, so it wants to wrap them at spaces, but there are no spaces to wrap.

I was able to fix it by shortening my base64 line length, but 76 is the standard for base64.

To Reproduce

const yaml = require("yaml");

// Get some base64 data (any long string with no whitespace would do)
base64data = btoa("testing ".repeat(20));

// Split it over lines of length 76
base64lines =  base64data.replace(/(.{76})/g, "$1\n"),

data = {
  foo: {
    bar: {
      baz: {
        base64: base64lines
      },
    },
  },
};

console.log(yaml.stringify(data));

Expected behaviour

foo:
  bar:
    baz:
      base64: |-
        dGVzdGluZyB0ZXN0aW5nIHRlc3RpbmcgdGVzdGluZyB0ZXN0aW5nIHRlc3RpbmcgdGVzdGluZyB0
        ZXN0aW5nIHRlc3RpbmcgdGVzdGluZyB0ZXN0aW5nIHRlc3RpbmcgdGVzdGluZyB0ZXN0aW5nIHRl
        c3RpbmcgdGVzdGluZyB0ZXN0aW5nIHRlc3RpbmcgdGVzdGluZyB0ZXN0aW5nIA==

Versions (please complete the following information):

  • Environment: Node v21.7.1
  • yaml: 2.6.0
@rjmunro rjmunro added the bug Something isn't working label Oct 17, 2024
@eemeli
Copy link
Owner

eemeli commented Oct 20, 2024

There is a real issue here, but I'm not sure if it should be solved.

By default, we assume that the content in YAML is human-readable, and that therefore it's reasonable to assume that it has some whitespace. Further, if we're dealing with multiline content with lines that need folding, we assume that the > style is approriate. In this case, these assumptions produce suboptimal results, as the lines don't actually include any whitespace and will overflow the soft wrap boundary that's at 80 characters by default.

It's possible to work around this by at least three different ways:

  1. Extend the line width with { lineWidth: 100 } so that the indented contents don't overflow.
  2. Enforce the block quote style with { blockQuote: 'literal' }.
  3. Represent the binary data as a Buffer or UInt8Array, and use { customTags: ['binary'] }. This will tag the value as !!binary, which is supported by parse() by default:
    const base64 = Buffer.from('testing '.repeat(20));
    const data = { foo: { bar: { baz: { base64 } } } };
    const str = YAML.stringify(data, { customTags: ['binary'] });
    foo:
      bar:
        baz:
          base64: !!binary |-
            dGVzdGluZyB0ZXN0aW5nIHRlc3RpbmcgdGVzdGluZyB0ZXN0aW5nIHRlc3RpbmcgdGVzdGlu
            ZyB0ZXN0aW5nIHRlc3RpbmcgdGVzdGluZyB0ZXN0aW5nIHRlc3RpbmcgdGVzdGluZyB0ZXN0
            aW5nIHRlc3RpbmcgdGVzdGluZyB0ZXN0aW5nIHRlc3RpbmcgdGVzdGluZyB0ZXN0aW5nIA==
    const parsed = YAML.parse(str);
    assert(typeof parsed.foo.bar.baz.base64 instanceof Buffer);

Now, despite all that, it would still be nice to detect the overflow with > and default to | block quote style in the original case. I may experiment with this a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants
@rjmunro @eemeli and others