Skip to content

Conversation

@aolieman
Copy link

This PR adds a reference implementation of the Significant Words Language Model by Dehghani et al.

I also updated the package to be compatible with python 3.7, added type annotations, and wrote some tests. The API remains backwards-compatible, but compatibility with python 2.x is broken as it stands.

I can see three roads going forward:

  • drop py 2.x compatibility in a new (major) release
  • add cross-version compatibility using e.g. six
  • do not merge this PR, in which case I intend to release it as e.g. weighwords3

Please let me know if this contribution has your interest, and what needs to be done to include it in a new release on PyPI.

aolieman added 30 commits April 1, 2019 17:35
(these NaNs are caused by `-inf - -inf`)
ignore out-of-vocabulary terms:
this prevents errors but does not "handle unseen words" as expressed in #1
format some docstrings
switched to backwards-compatible annotations, because while python may
be PEP 585-ready, mypy does not deal with builtin generics yet
(*facepalm* for major oversight)
aolieman added 7 commits May 17, 2019 21:42
parsimonize specific with fixed w=1/3;
format floats in lambda logging
(this can significantly reduce the memory complexity)
moved test fixture;
updated copyright in license statement
@aolieman aolieman changed the title Add SignificantWordsLM, tests, and py3 compatibility Add SignificantWordsLM, tests, and py3 compatibility May 18, 2019
@aolieman
Copy link
Author

aolieman commented Jun 9, 2019

I got impatient and (also) released my additions under a new name:
https://github.com/aolieman/wayward
https://wayward.readthedocs.io/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant