Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TOML] Renderer never renders maps as inline values #61

Open
archer-321 opened this issue May 30, 2024 · 5 comments
Open

[TOML] Renderer never renders maps as inline values #61

archer-321 opened this issue May 30, 2024 · 5 comments

Comments

@archer-321
Copy link

Context

AFAICT, the current implementation of toml.Renderer has code to render map-like elements as inline tables:
toml.pkl:146

However, the current implementation doesn't seem to use that code, as every type that gets converted to a Map ends up rendering as a regular table.

Suggested change

While inlining small tables generally improves readability, large tables can quickly lead to very long lines if inlined.
To determine whether a table would be small enough to inline, the implementation could

  • define a maximum number of keys above which it generates a regular table instead.
    • This could be too inaccurate, as long values like strings or even nested tables might lead to long inline tables regardless.
  • render any map fully inline and compare it with a maximum line length.
    • This would be accurate, but especially for larger tables, this might lead to performance problems
  • add an annotation that makes the renderer inline the table.
    • If the annotation is specific to pkl.toml, this would force general packages to depend on a rendering-specific package. Moreover, it would be impossible to inline third-party types.

I understand that none of the suggestions above are ideal, but considering no issue exists for this segment of dead code, I opened this issue anyway.

Example

The examples below were generated using [email protected], and pkl --version yields 0.26.0-dev+47f161a (I built the native binary locally).

The following example contains a mapping of usernames to the user's UID and GID:

import "package://pkg.pkl-lang.org/pkl-pantry/[email protected]#/toml.pkl"

output {
    renderer = new toml.Renderer {}
}

users {
    foo {
        uid = 1000
        gid = 1000
    }
    cups {
        uid = 209
        gid = 209
    }
    nobody {
        uid = 65534
        gid = 65534
    }
}

Currently, it produces the following document:

[users.foo]
uid = 1000
gid = 1000

[users.cups]
uid = 209
gid = 209

[users.nobody]
uid = 65534
gid = 65534

However, the same data could be expressed in TOML using inline tables:

[users]
foo = { uid = 1000, gid = 1000 }
cups = { uid = 209, gid = 209 }
nobody = { uid = 65534, gid = 65534 }

While the verbose tables that the current implementation generates are still readable in this basic example, the additional tables quickly become a problem if many small objects are part of another table. For example, the following version renders as five tables and an array of tables (two [[]] tables).

Second example
import "package://pkg.pkl-lang.org/pkl-pantry/[email protected]#/toml.pkl"

output {
    renderer = new toml.Renderer {}
}


machines = new Listing {
    new {
        hostname = "office"
        users {
            foo {
                uid = 1000
                gid = 1000
            }
            cups {
                uid = 209
                gid = 209
            }
            nobody {
                uid = 65534
                gid = 65534
            }
        }
    }
    new {
        hostname = "server"
        users {
            admin {
                uid = 1000
                gid = 1000
            }
            named {
                uid = 40
                gid = 40
            }
        }
    }
}

This generates:

[[machines]]
hostname = "office"

[machines.users.foo]
uid = 1000
gid = 1000

[machines.users.cups]
uid = 209
gid = 209

[machines.users.nobody]
uid = 65534
gid = 65534

[[machines]]
hostname = "server"

[machines.users.admin]
uid = 1000
gid = 1000

[machines.users.named]
uid = 40
gid = 40

A version using inline tables could represent the same data using two tables and an array of tables or even just the array of tables if the users table is inlined as well (even though that would lead to very long lines).

@bioball
Copy link
Contributor

bioball commented May 30, 2024

The TOML renderer will output inline tables in some circumstances, for example, if it is producing a mixed object:

import "package://pkg.pkl-lang.org/pkl-pantry/[email protected]#/toml.pkl"

res {
  1
  new { foo = 2 }
}

output {
    renderer = new toml.Renderer {}
}

This produces:

res = [ 1, { foo = 2 } ]

What is the motivation for wanting inline tables? FWIW: trying to make rendered output textually equivalent to some output is somewhat of a never-ending battle. Generally, you should trust that Pkl is producing data that is semantically equivalent to your desired output.

@archer-321
Copy link
Author

archer-321 commented May 30, 2024

The TOML renderer will output inline tables in some circumstances, for example, if it is producing a mixed object

Oh, I missed mixed-type listings (or other mixed-type objects for which toMap() returns an empty map). Looking at the code again, it makes sense why those would end up inlined.
I assume TOML's syntax wouldn't allow for a non-inline table definition in arrays that can't be represented as an array of tables.

However, this implies objects with properties/entries can never generate inline tables.

What is the motivation for wanting inline tables? FWIW: trying to make rendered output textually equivalent to some output is somewhat of a never-ending battle. Generally, you should trust that Pkl is producing data that is semantically equivalent to your desired output.

I'm currently exploring Pkl as a "configuration preprocessor" with validation at configuration time. While the generated configuration files would not be edited manually, having human-readable configuration files would allow manual inspection even if the source .pkl files aren't available.

That said, one reason I opened this issue was that I considered the inline table code to be unused. With mixed-type arrays even requiring this syntax, this issue is reduced to the question of how much Pkl focuses on human-readable output.

@bioball
Copy link
Contributor

bioball commented May 30, 2024

That said, one reason I opened this issue was that I considered the inline table code to be unused. With mixed-type arrays even requiring this syntax, this issue is reduced to the question of how much Pkl focuses on human-readable output.

We definitely want to produce human-readable output. Readability is a little subjective, though; I don't know that this is harder to read:

[users.foo]
uid = 1000
gid = 1000

[users.cups]
uid = 209
gid = 209

[users.nobody]
uid = 65534
gid = 65534

Do you have a suggestion for what the heuristic should be? At what point should the renderer emit inline tables?

@archer-321
Copy link
Author

Do you have a suggestion for what the heuristic should be? At what point should the renderer emit inline tables?

This is indeed a problem, as the three potential implementations I listed in the issue description all have drawbacks.
Personally, I inline tables if they have few, short values, e.g., tables with <= 5 keys, no nested tables, and no long strings.
An alternative metric could be the resulting line length. E.g., if the line is longer than 100 characters, a regular table is used instead.

An alternative that wouldn't require a heuristic would be a way for implementations to opt in to inlining. As an example, a property shouldInline: (unknown) -> Boolean in toml.Renderer could allow the user to implement custom heuristics.

We definitely want to produce human-readable output. Readability is a little subjective, though; I don't know that this is harder to read

Indeed, the first example is still readable. For me, readability only becomes problematic when using "small tables" in an array of tables.
The problem is that TOML's table syntax makes it hard to spot "supertables" when defining a lot of "subtables" using the non-inlined syntax. This is an inherent limitation of TOML, as the language heavily favours "flat" configuration schemas over deeply nested ones. However, adding a way to inline small tables can help maintain readability while grouping related fields.

For example, let's consider the following Pkl definition:

import "package://pkg.pkl-lang.org/pkl-pantry/[email protected]#/toml.pkl"

output {
    renderer = new toml.Renderer {}
}

class Theme {
    name: String
    primaryColor: Color
    author: Author
}

class Color {
    red: Int(isBetween(0, 255))
    green: Int(isBetween(0, 255))
    blue: Int(isBetween(0, 255))
}

class Author {
    name: String
    email: String(matches(Regex(".*@.*")))
}

class SiteConfiguration {
    themes: Listing<Theme>
    // The properties below could use classes as well
    languages: Listing<String>
    features: Listing<String>
}

`site-configuration`: SiteConfiguration

Let's write a configuration file amending the site-configuration key with the following data:

`site-configuration`: SiteConfiguration = new {
    languages {
        "de_DE"
        "en_GB"
        "en_US"
        // ...
    }
    features {
        "public-uploads"
        "group-creation"
        "shared-ci-runners"
        "logo-rebrand"
    }
    themes {
        new {
            name = "Dark Theme"
            primaryColor {
                red = 0x22
                green = 0x27
                blue = 0x2E
            }
            author {
                name = "Example Company"
                email = "[email protected]"
            }
        }

        new {
            name = "Light Theme"
            primaryColor {
                red = 0xC5
                green = 0xD1
                blue = 0xDE
            }
            author {
                name = "Example Company"
                email = "[email protected]"
            }
        }

        new {
            name = "High Contrast Light Theme"
            primaryColor {
                red = 0xFF
                green = 0xFF
                blue = 0xFF
            }
            author {
                name = "Other Company"
                email = "noreply@localhost"
            }
        }

        new {
            name = "High Contrast Dark Theme"
            primaryColor {
                red = 0x00
                green = 0x00
                blue = 0x00
            }
            author {
                name = "Other Company"
                email = "noreply@localhost"
            }
        }
    }
}

While the Pkl code for this configuration is also very long, the indentation and blocks using { } keep this configuration still relatively readable.

Currently, this code generates the following output:

[site-configuration]
languages = [ "de_DE", "en_GB", "en_US" ]
features = [ "public-uploads", "group-creation", "shared-ci-runners", "logo-rebrand" ]

[[site-configuration.themes]]
name = "Dark Theme"

[site-configuration.themes.primaryColor]
red = 34
green = 39
blue = 46

[site-configuration.themes.author]
name = "Example Company"
email = "[email protected]"

[[site-configuration.themes]]
name = "Light Theme"

[site-configuration.themes.primaryColor]
red = 197
green = 209
blue = 222

[site-configuration.themes.author]
name = "Example Company"
email = "[email protected]"

[[site-configuration.themes]]
name = "High Contrast Light Theme"

[site-configuration.themes.primaryColor]
red = 255
green = 255
blue = 255

[site-configuration.themes.author]
name = "Other Company"
email = "noreply@localhost"

[[site-configuration.themes]]
name = "High Contrast Dark Theme"

[site-configuration.themes.primaryColor]
red = 0
green = 0
blue = 0

[site-configuration.themes.author]
name = "Other Company"
email = "noreply@localhost"

Especially if we now start using classes for languages and features, this quickly becomes hard to read.
TOML would allow the following result using inline tables, which I consider more readable:

[site-configuration]
languages = [ "de_DE", "en_GB", "en_US" ]
features = [ "public-uploads", "group-creation", "shared-ci-runners", "logo-rebrand" ]

[[site-configuration.themes]]
name = "Dark Theme"
primaryColor = { red = 34, green = 39, blue = 46 }
author = { name = "Example Company", email = "[email protected]" }

[[site-configuration.themes]]
name = "Light Theme"
primaryColor = { red = 197, green = 209, blue = 222 }
author = { name = "Example Company", email = "[email protected]" }

[[site-configuration.themes]]
name = "High Contrast Light Theme"
primaryColor = { red = 255, green = 255, blue = 255 }
author = { name = "Other Company", email = "noreply@localhost" }

[[site-configuration.themes]]
name = "High Contrast Dark Theme"
primaryColor = { red = 0, green = 0, blue = 0 }
author = { name = "Other Company", email = "noreply@localhost" }

Of course, nested tables in TOML will always be limited to a few levels before they become unreadable, and I understand that inline tables aren't a magic trick that will make all problems go away and turn TOML into YAML.

However, in many configuration formats I'm working with, closely related fields (like the colour values of primaryColor) are grouped together into separate objects, even if the configuration schema tries to be flat overall. Even being able to reduce one level of non-inlined tables would help those files become significantly more readable.

@bioball
Copy link
Contributor

bioball commented Jun 4, 2024

That's fair. TOML's method for nesting arrays of objects as top-level tables makes it hard to maintain context when reading through a file.

Their inline tables are also kind of limiting because they don't allow newlines. Just turning things into inline tables can also be rough.

We'll probably add the TOML renderer to the standard library. I think this is something that we'll look to improve when we do that, so we probably won't prioritize this fix for the pkl.toml package. If you'd like, though, feel free to submit a PR!

These heuristics can be added as properties to the Renderer class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants