Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal padding in pool allocator #365

Merged
merged 3 commits into from
Sep 2, 2024
Merged

Minimal padding in pool allocator #365

merged 3 commits into from
Sep 2, 2024

Conversation

awnawab
Copy link
Contributor

@awnawab awnawab commented Aug 27, 2024

Currently, the pool allocator pads every allocation to 8 bytes to avoid potential memory alignment issues on device. This is only strictly necessary for arrays that are not NPROMA sized, as this is likely to be a multiple of 8 in any case. If only such arrays are padded, then the memory footprint of the stack can potentially be reduced, especially for single precision builds.

Once this PR is reviewed, accepted and merged, could it please be tagged?

Copy link

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/365/index.html

@awnawab awnawab changed the title Naan stack padding Minimal padding in pool allocator Aug 27, 2024
Copy link

codecov bot commented Aug 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.40%. Comparing base (1f87624) to head (c0a856c).
Report is 12 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #365   +/-   ##
=======================================
  Coverage   95.40%   95.40%           
=======================================
  Files         178      178           
  Lines       37267    37282   +15     
=======================================
+ Hits        35555    35570   +15     
  Misses       1712     1712           
Flag Coverage Δ
lint_rules 96.39% <ø> (ø)
loki 95.39% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@awnawab awnawab marked this pull request as ready for review August 27, 2024 08:44
Copy link
Collaborator

@MichaelSt98 MichaelSt98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks very good at first glance.

So if you always want to have padding (for whatever reason) you would need to not pass something for horizontal?! Or to put it another way: Do we want or need this to be optional?

@@ -624,8 +623,9 @@ def _create_stack_allocation(self, stack_ptr, stack_end, ptr_var, arr, stack_siz
dim = Product((dim, _dim))
arr_type_bytes = InlineCall(Variable(name='C_SIZEOF'),
parameters=as_tuple(self._get_c_sizeof_arg(arr)))
arr_type_bytes = InlineCall(function=Variable(name='MAX'),
parameters=(arr_type_bytes, Literal(8)), kw_parameters=())
if not any(s in dim for s in self.horizontal.size_expressions):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment here explaining why you only pad if no dimension corresponds to a horizontal size expression?

Copy link
Contributor Author

@awnawab awnawab Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good spot, this could really do with a comment. Thanks!

The minimal padding could certainly be made optional, but for IFS components we wouldn't add any safety by padding everything and we would hurt performance a lot in single precision builds. So even if it is made optional, I think it should be enabled by default. Whilst I agree that not relying on the horizontal dimension does make the transformation more general, I don't think an already quite complex transformation needs another option 😄

I'll leave the final call on whether to make it optional up to you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline and even though I am aware that adapting the tests will be a bit of a torture ... I think you/we should make this another option 😬 Thank you ...

@@ -851,8 +851,6 @@ def create_pool_allocator(self, routine, stack_size):
_real_size_bytes = Cast(name='REAL', expression=Literal(1), kind=_kind)
_real_size_bytes = InlineCall(Variable(name='C_SIZEOF'),
parameters=as_tuple(_real_size_bytes))
_real_size_bytes = InlineCall(function=Variable(name='MAX'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought it through completely yet, but do we really never need to pad here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We pad the individual allocations themselves to ensure correct alignment on device. But here we are just converting the stack size back to bytes from integers. So we should never pad here and we should also never pad the denominator when we convert the stack size from bytes to integers (just before the actual stack allocation).

@awnawab
Copy link
Contributor Author

awnawab commented Aug 27, 2024

Confirmed as bit-identical on ecPhys.

Copy link
Collaborator

@reuterbal reuterbal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Implementation looks good to me. Have you also tested CLOUDSC standalone with this on GPU?

@awnawab
Copy link
Contributor Author

awnawab commented Aug 30, 2024

Thanks @reuterbal and @MichaelSt98 for the feedback!

@MichaelSt98: As requested, I've made the horizontal dim, and therefore the minimal padding, now optional
@reuterbal: Yes I just tested with the standalone cloudsc on GPU, works 👌

@reuterbal reuterbal added the ready for merge This PR has been approved and is ready to be merged label Aug 30, 2024
Copy link
Collaborator

@mlange05 mlange05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only had a very high-level look, but seems sensible to me. GTG

@reuterbal reuterbal merged commit fce5733 into main Sep 2, 2024
13 checks passed
@reuterbal reuterbal deleted the naan-stack-padding branch September 2, 2024 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for merge This PR has been approved and is ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants