Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

de/serialization and validation #13

Open
thejohnfreeman opened this issue Mar 28, 2015 · 4 comments
Open

de/serialization and validation #13

thejohnfreeman opened this issue Mar 28, 2015 · 4 comments

Comments

@thejohnfreeman
Copy link
Collaborator

Each type needs a deserializer. Most types have a constructor (e.g. int, float, bool), and that is enough. Clients should be able to provide alternatives, though, because a deserializer also encompasses validation.

Some clients may want to serialize configurations. Most types have a __str__ method, and that is enough. Clients should be able to provide alternatives, though, in case __str__ does not exist or match the deserializer.

I saw the issue on Confit templates, and the idea is very similar. Going back to my assumptions (#7), though, I think it needs to be mandatory, not optional. A simple dict will be enough to define both types and defaults, however, so it should not be burdensome.

@sampsyo
Copy link
Member

sampsyo commented Mar 29, 2015

Indeed. In Confit, you can write view.get(int) to ensure that you get an int back, but you can also write view.get() to get whatever unsanitized data the YAML gives us. (I don't have a particular attachment to that unsanitized mode; it's definitely a hack.) Here, int is just a convenient alias for a full-fledged template class that deserializes integers—a little overkill for that case, but that's how deserialization is optionally made client-specific.

The neat thing about Confit currently is that the view system lets you be flexible about when you do validation for what. You can of course validate an entire configuration at once, but if you have a very large application, you can modularize the validation. For example, if you have two variables, foo and bar, you can either:

validated = config.get({'foo': int, 'bar': str})
print(validated['foo'], validated['bar'])

or:

print(validated['foo'].get(int))
print(validated['bar'].get(str))

On serialization: Would a goal be to re-emit a valid configuration file that could feasibly be parsed again? This could be useful, for example, if an application wants to be able to programmatically update the configuration file.

@thejohnfreeman
Copy link
Collaborator Author

That is exactly the goal of serialization. It especially comes in handy for debugging and auditing, where we want to keep around the exact configuration a previous run used. Doing "round-trip" editing (rewriting the user's configuration file with updates) is tricky, because we'll want to preserve comments and whitespace as much as possible. There are some libraries that try to implement that for us, though, like ruamel.yaml.

@sampsyo
Copy link
Member

sampsyo commented Mar 30, 2015

Yeah, a full round-trip would be tricky. FWIW, Confit currently tries to get halfway there—it includes some heuristics for interspersing comments back into the formatted YAML. ruamel.yaml looks much more complete.

A simpler format like TOML might be easier to round-trip.

@thejohnfreeman
Copy link
Collaborator Author

I hadn't heard of TOML before (though I've heard of the author before), but in our model it will be incredibly easy to define new input formats as functions that emit a configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants