Skip to content

Conversation

@shizzard
Copy link
Contributor

@shizzard shizzard commented Dec 2, 2024

No description provided.

h2_hashes = #{}
name = not_set :: atom(),
partition_number = not_set :: non_neg_integer() | not_set,
diff_pair = not_set, % :: ?DIFF_PAIR()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vird what is the diff pair here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pair of mining difficulties, the first element is the mining difficulty for one-chunk solutions, the second element - for two-chunk solutions.

State#state{
sub_chunk_cache_size = maps:put(SessionKey, CacheSize + Delta,
State#state.sub_chunk_cache_size) }.
NewCacheSize = ar_util:clamp_lower(CacheSize + Delta, 0),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key change: do not let the sub_chunk_cache_size go below zero, as this does not make any sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we fighting the symptom? Perhaps, let's leave a comment this is a temporary measure until we figure out why we subtract more than we add, if that is the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, yes, you're right; proper fix will require the deep (sub)chunk cache refactoring, which is out of scope for this task.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shizzard what about adding a log_warning message whenever CacheSize+Delta is below zero? We do want to fix the OOMs as quick as possible and clamping may help there, but I'm a little worried we may also lose this valuable signal that something is wrong. Which may make tracking down the true root cause harder.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slacked with some more information that might help with reproducing the negative cache size (and so might help get at the root cause)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about adding a log_warning message

Thats a very good idea, will do.

%% @doc Clamp a value to a lower and upper bound.
%% If the value is less than the lower bound, return the lower bound.
%% If the value is greater than the upper bound, return the upper bound.
clamp(Value, LowerBound, UpperBound) ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: ar_util:between/3 also does this: https://github.com/ArweaveTeam/arweave/pull/660/files#diff-1f26621f4ea9ef47e071afce0f4ce9fc3e6bacc76089ac9037c00e0c07a5ce9eR46

Can you add a comment to ar_util:between that people should start using ar_util:clamp instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't see the existing one, okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants