Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stateless Huffman #8

Open
jimcarreer opened this issue Jul 8, 2015 · 12 comments
Open

Stateless Huffman #8

jimcarreer opened this issue Jul 8, 2015 · 12 comments

Comments

@jimcarreer
Copy link
Contributor

As discussed with @Lukasa the Huffman encoder / decoder is currently initialized using two static structures defined in huffman_constants.py. We could potentially remove the need for this pseudo-stateful nature and instead have a completely stateless Huffman encoder / decoder with the addition of another static structure related to decoding (encoding is currently trivial to make stateless). This may also offer a minor performance enhancement.

@jimcarreer
Copy link
Contributor Author

I think I'm going to take a look at this, this weekend.

@Lukasa
Copy link
Member

Lukasa commented Mar 21, 2016

An FYI, I've migrated to an essentially stateless Huffman decoding implementation in #35.

@thedrow
Copy link

thedrow commented Dec 12, 2016

So this should be closed?

@Lukasa
Copy link
Member

Lukasa commented Dec 12, 2016

Nope, we still have a stateful encoder.

@thedrow
Copy link

thedrow commented Dec 12, 2016

I can try to write it in C if that helps.

@Lukasa
Copy link
Member

Lukasa commented Dec 12, 2016

I would like to avoid us having any native extensions in this module at this time.

@thedrow
Copy link

thedrow commented Dec 12, 2016

Even optionally?

@Lukasa
Copy link
Member

Lukasa commented Dec 12, 2016

Tentatively, yes, I'm opposed to having an optional native extension.

My thinking on this is a bit complex, so let me lay it out.

  1. Generally speaking I have a policy regarding C code, which is that wherever possible I'd like to avoid having untrusted input reach C code. The risks of running C code on untrusted input are too high, and I find that generally speaking I and most software developers I work with are bad at spotting places where assumptions are made that data will be "reasonable". That makes C a great vector for bugs, and in particular for the kind of bug that does not affect a pure-Python program.

    For that reason, if we were going to write a native extension I'd want it to be in Rust, not C. However,

  2. The wheel ecosystem is not quite there yet. While it's now mostly possible to ship binary wheels for most platforms, there are a number of Linux distributions that cannot use manylinux1 wheels. This is a problem if you want to ship a Rust extension, because it's very difficult to indicate the need for a Rust compile chain on a random Linux system.

    This problem also affects C extensions, by the way: one of the systems that doesn't support a manylinux1 wheel is Alpine, which uses musl libc. That means that if we shipped a C extension on that platform, we'd require the installation of a C toolchain for users to install the module: not great.

  3. So the only option is, as you said, to make the module optional: we'd ideally want to check whether we could compile it and, if we couldn't, we'd simply fall back to pure-Python code.

    I have extremely mixed feelings about this, most of them not good. In particular, it's a bit of a hairy debugging issue: what code the user is executing becomes conditional on difficult-to-introspect features of their platform; we double the number of code paths that need to be tested and debugged; and we double the workload for feature addition and API changes.

The TL;DR here, I think, is that right now I'm about -0.25 on having a native extension. If we could come up with a nice Rust extension then I'd reconsider, but I'd really rather not maintain a C extension if it can possibly be avoided.

@Lukasa
Copy link
Member

Lukasa commented Dec 12, 2016

In fact, let state an outline of what we'd need to make me happy here.

If any contributor can write a Rust extension that can be optionally compiled if a buildchain is present, that flags its presence or absence clearly in a module-scoped dunder variable, that uses CFFI, and that doesn't vomit a load of output into the terminal on install on systems that do not have a Rust toolchain present, I'd be willing to consider merging such an extension.

@thedrow
Copy link

thedrow commented Dec 12, 2016

It shouldn't be much of a problem with https://github.com/novocaine/rust-python-ext

@Lukasa
Copy link
Member

Lukasa commented Dec 12, 2016

I've already written a Rust extension module in Python, that's not the hard bit. The harder bits are a) making it fail gracefully, b) keeping it maintained, and c) actually writing the extension. ;)

@thedrow
Copy link

thedrow commented Dec 12, 2016

Right, this might take some effort since I'm not very familiar with Rust's lifelines and borrow semantics. I will give it a shot though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants