Skip to content

Improve priority queue memory usage #7

@gochujang-c

Description

@gochujang-c

This queue is what is used to keep guesses in probability order. It is also the cause of most of the memory usage, and is also what takes all space in checkpoint (the shared state between nodes).

At the very least, we could flatten the structure of each queue ParseTree. Currently, the structure is as follows:

record ParseTree(double baseProbability, double probability, ReplacementSet replacementSet)
record ReplacementSet(List<String> variables, int[] indices)

The only reason for nesting the ReplacementSet is for readability, the original implementation also has it this way. We could inline this object in ParseTree, saving us extra object overhead. Besides that, baseProbability and variables (the latter simply being the base structure) are actually just part of the PCFG trained rule. Instead of duplicating this information, we can simply store an index to the base structure entry, saving an extra 8 bytes double overhead and the difference between an object and an int field.

The result would be:

record ParseTree(int baseStructureIndex, double probability, int[] variableIndices)

However, this has some downsides:

  • Less readable code
  • Dereferencing necessary each time (though probably easily outweighed by other operations)
  • Will mean we can keep a larger queue in memory, but this will result in larger state to be shared between nodes

If results would still be less than ideal, we could look into dynamic compression or even into modifying the generation algorithm.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions