Minimal padding in pool allocator #365

awnawab · 2024-08-27T08:01:54Z

Currently, the pool allocator pads every allocation to 8 bytes to avoid potential memory alignment issues on device. This is only strictly necessary for arrays that are not NPROMA sized, as this is likely to be a multiple of 8 in any case. If only such arrays are padded, then the memory footprint of the stack can potentially be reduced, especially for single precision builds.

Once this PR is reviewed, accepted and merged, could it please be tagged?

github-actions · 2024-08-27T08:04:38Z

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/365/index.html

codecov · 2024-08-27T08:39:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.40%. Comparing base (1f87624) to head (c0a856c).
Report is 12 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #365   +/-   ##
=======================================
  Coverage   95.40%   95.40%           
=======================================
  Files         178      178           
  Lines       37267    37282   +15     
=======================================
+ Hits        35555    35570   +15     
  Misses       1712     1712

Flag	Coverage Δ
lint_rules	`96.39% <ø> (ø)`
loki	`95.39% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

MichaelSt98

Thanks! Looks very good at first glance.

So if you always want to have padding (for whatever reason) you would need to not pass something for horizontal?! Or to put it another way: Do we want or need this to be optional?

MichaelSt98 · 2024-08-27T08:56:41Z

loki/transformations/pool_allocator.py

@@ -624,8 +623,9 @@ def _create_stack_allocation(self, stack_ptr, stack_end, ptr_var, arr, stack_siz
            dim = Product((dim, _dim))
        arr_type_bytes = InlineCall(Variable(name='C_SIZEOF'),
                                            parameters=as_tuple(self._get_c_sizeof_arg(arr)))
-        arr_type_bytes = InlineCall(function=Variable(name='MAX'),
-                    parameters=(arr_type_bytes, Literal(8)), kw_parameters=())
+        if not any(s in dim for s in self.horizontal.size_expressions):


Could you add a comment here explaining why you only pad if no dimension corresponds to a horizontal size expression?

Yes, good spot, this could really do with a comment. Thanks!

The minimal padding could certainly be made optional, but for IFS components we wouldn't add any safety by padding everything and we would hurt performance a lot in single precision builds. So even if it is made optional, I think it should be enabled by default. Whilst I agree that not relying on the horizontal dimension does make the transformation more general, I don't think an already quite complex transformation needs another option 😄

I'll leave the final call on whether to make it optional up to you.

As discussed offline and even though I am aware that adapting the tests will be a bit of a torture ... I think you/we should make this another option 😬 Thank you ...

MichaelSt98 · 2024-08-27T09:07:15Z

loki/transformations/pool_allocator.py

@@ -851,8 +851,6 @@ def create_pool_allocator(self, routine, stack_size):
                _real_size_bytes = Cast(name='REAL', expression=Literal(1), kind=_kind)
                _real_size_bytes = InlineCall(Variable(name='C_SIZEOF'),
                                              parameters=as_tuple(_real_size_bytes))
-                _real_size_bytes = InlineCall(function=Variable(name='MAX'),


I haven't thought it through completely yet, but do we really never need to pad here?

We pad the individual allocations themselves to ensure correct alignment on device. But here we are just converting the stack size back to bytes from integers. So we should never pad here and we should also never pad the denominator when we convert the stack size from bytes to integers (just before the actual stack allocation).

awnawab · 2024-08-27T16:16:07Z

Confirmed as bit-identical on ecPhys.

reuterbal

Thanks! Implementation looks good to me. Have you also tested CLOUDSC standalone with this on GPU?

awnawab · 2024-08-30T13:42:39Z

Thanks @reuterbal and @MichaelSt98 for the feedback!

@MichaelSt98: As requested, I've made the horizontal dim, and therefore the minimal padding, now optional
@reuterbal: Yes I just tested with the standalone cloudsc on GPU, works 👌

mlange05

Only had a very high-level look, but seems sensible to me. GTG

awnawab added 2 commits August 27, 2024 09:47

STACK: only pad non NPROMA arrays

2575391

STACK: update tests to reflect minimal padding

41f220d

awnawab changed the title ~~Naan stack padding~~ Minimal padding in pool allocator Aug 27, 2024

awnawab marked this pull request as ready for review August 27, 2024 08:44

awnawab requested review from reuterbal, mlange05 and MichaelSt98 August 27, 2024 08:44

MichaelSt98 reviewed Aug 27, 2024

View reviewed changes

reuterbal approved these changes Aug 30, 2024

View reviewed changes

STACK: make horizontal dim, and therefore minimal padding, optional

c0a856c

reuterbal approved these changes Aug 30, 2024

View reviewed changes

reuterbal added the ready for merge This PR has been approved and is ready to be merged label Aug 30, 2024

mlange05 approved these changes Sep 2, 2024

View reviewed changes

MichaelSt98 approved these changes Sep 2, 2024

View reviewed changes

reuterbal merged commit fce5733 into main Sep 2, 2024
13 checks passed

reuterbal deleted the naan-stack-padding branch September 2, 2024 13:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimal padding in pool allocator #365

Minimal padding in pool allocator #365

awnawab commented Aug 27, 2024 •

edited

Loading

github-actions bot commented Aug 27, 2024

codecov bot commented Aug 27, 2024 •

edited

Loading

MichaelSt98 left a comment

MichaelSt98 Aug 27, 2024

awnawab Aug 27, 2024 •

edited

Loading

MichaelSt98 Aug 28, 2024

MichaelSt98 Aug 27, 2024

awnawab Aug 27, 2024

awnawab commented Aug 27, 2024

reuterbal left a comment

awnawab commented Aug 30, 2024

mlange05 left a comment

Minimal padding in pool allocator #365

Minimal padding in pool allocator #365

Conversation

awnawab commented Aug 27, 2024 • edited Loading

github-actions bot commented Aug 27, 2024

codecov bot commented Aug 27, 2024 • edited Loading

Codecov Report

MichaelSt98 left a comment

Choose a reason for hiding this comment

MichaelSt98 Aug 27, 2024

Choose a reason for hiding this comment

awnawab Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

MichaelSt98 Aug 28, 2024

Choose a reason for hiding this comment

MichaelSt98 Aug 27, 2024

Choose a reason for hiding this comment

awnawab Aug 27, 2024

Choose a reason for hiding this comment

awnawab commented Aug 27, 2024

reuterbal left a comment

Choose a reason for hiding this comment

awnawab commented Aug 30, 2024

mlange05 left a comment

Choose a reason for hiding this comment

awnawab commented Aug 27, 2024 •

edited

Loading

codecov bot commented Aug 27, 2024 •

edited

Loading

awnawab Aug 27, 2024 •

edited

Loading