-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal padding in pool allocator #365
Conversation
Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/365/index.html |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #365 +/- ##
=======================================
Coverage 95.40% 95.40%
=======================================
Files 178 178
Lines 37267 37282 +15
=======================================
+ Hits 35555 35570 +15
Misses 1712 1712
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks very good at first glance.
So if you always want to have padding (for whatever reason) you would need to not pass something for horizontal
?! Or to put it another way: Do we want or need this to be optional?
@@ -624,8 +623,9 @@ def _create_stack_allocation(self, stack_ptr, stack_end, ptr_var, arr, stack_siz | |||
dim = Product((dim, _dim)) | |||
arr_type_bytes = InlineCall(Variable(name='C_SIZEOF'), | |||
parameters=as_tuple(self._get_c_sizeof_arg(arr))) | |||
arr_type_bytes = InlineCall(function=Variable(name='MAX'), | |||
parameters=(arr_type_bytes, Literal(8)), kw_parameters=()) | |||
if not any(s in dim for s in self.horizontal.size_expressions): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment here explaining why you only pad if no dimension corresponds to a horizontal size expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good spot, this could really do with a comment. Thanks!
The minimal padding could certainly be made optional, but for IFS components we wouldn't add any safety by padding everything and we would hurt performance a lot in single precision builds. So even if it is made optional, I think it should be enabled by default. Whilst I agree that not relying on the horizontal
dimension does make the transformation more general, I don't think an already quite complex transformation needs another option 😄
I'll leave the final call on whether to make it optional up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline and even though I am aware that adapting the tests will be a bit of a torture ... I think you/we should make this another option 😬 Thank you ...
@@ -851,8 +851,6 @@ def create_pool_allocator(self, routine, stack_size): | |||
_real_size_bytes = Cast(name='REAL', expression=Literal(1), kind=_kind) | |||
_real_size_bytes = InlineCall(Variable(name='C_SIZEOF'), | |||
parameters=as_tuple(_real_size_bytes)) | |||
_real_size_bytes = InlineCall(function=Variable(name='MAX'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought it through completely yet, but do we really never need to pad here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We pad the individual allocations themselves to ensure correct alignment on device. But here we are just converting the stack size back to bytes from integers. So we should never pad here and we should also never pad the denominator when we convert the stack size from bytes to integers (just before the actual stack allocation).
Confirmed as bit-identical on ecPhys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Implementation looks good to me. Have you also tested CLOUDSC standalone with this on GPU?
Thanks @reuterbal and @MichaelSt98 for the feedback! @MichaelSt98: As requested, I've made the horizontal dim, and therefore the minimal padding, now optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only had a very high-level look, but seems sensible to me. GTG
Currently, the pool allocator pads every allocation to 8 bytes to avoid potential memory alignment issues on device. This is only strictly necessary for arrays that are not
NPROMA
sized, as this is likely to be a multiple of 8 in any case. If only such arrays are padded, then the memory footprint of the stack can potentially be reduced, especially for single precision builds.Once this PR is reviewed, accepted and merged, could it please be tagged?