Skip to content

Conversation

@mmakowski
Copy link

@mmakowski mmakowski commented Jan 29, 2018

I have made a few changes for my own use that I think others might benefit from, hence this pull request. The new features are:

  • support for sparse matrices, which are common in NLP tasks; previously the transformation would just fail if supplied with a sparse matrix, now it should process it
  • removal of features that ended up with a single bucket only; this again speeds up processing and reduces memory pressure in very high-dimensional problems
  • parallel fit, with the customary n_jobs parameter controlling the degree of parallelism. The transform has not been parallelised yet, but since it is orders of magnitude faster that is not very important in my opinion.

All of those only required changes in the Python wrapper, not in the native part. They should be fully backwards-compatible.

If you would prefer only some of those features but not the others then I will be happy to split this PR up. I also appreciate that they might not fit with your concept of how the library should evolve; if so, no problem at all, I will just use my fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant