You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An operation creates a different object with the same 10 columns (e.g. rows(t)), MemPool thinks that this operation needs to free up 1 GB of space, starts evicting objects to disk
The problem is that MemPool is not accounting for the vectors it writes to disk.
designate every vector with an ID when it gets written to wire or disk using MemPool
keep a shared dictionary which maintains a ref-count of each vector using its ID.
when writing a vector to disk to evict it from working memory, store the file and offset in the shared dictionary, point to offset and previous file name instead of writing the vector to the spilled object.
This has a few problems:
shared dictionary in a cluster is still not a thing
When only a vector is required from within a table, you still have to keep the whole file containing the table around. This involves pretty thorough bookkeeping. One solution is to write each vector into a separate file, but this can overwhelm a file system since it would increase the number of files. Another solution is to do manual page management.
The text was updated successfully, but these errors were encountered:
Example situation:
rows(t)
), MemPool thinks that this operation needs to free up 1 GB of space, starts evicting objects to diskThe problem is that MemPool is not accounting for the vectors it writes to disk.
Based on @tanmaykm's suggested fix:
This has a few problems:
The text was updated successfully, but these errors were encountered: