Issue with discrete boundary condition accessing model_fields on distributed GPU setup #5461
Replies: 4 comments
-
|
@phyo-wai-thaw can you try using this boundary condition? @inline function top_bc_func(i, j, grid, clock, model_fields, parameters)
k = grid.Nz
T1 = @inbounds model_fields.T[i, j, k]
return T1 * parameters.scale
endyou're missing an inbounds --- not sure if this is the exact problem, but it looks like you're facing a compilation failure. The other suggestion is to try on |
Beta Was this translation helpful? Give feedback.
-
|
@phyo-wai-thaw this step is done automatically when constructing the architecture. It should not be needed and we recommend not to do it manually since it is wrong for multi-node communication where the size of the communicator > number of GPUs per node |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for your suggestions. I tested again as suggested, but I still encounter the same issue when using distributed GPUs. I also reproduced the issue on distributed CPUs. Here is the full error message: |
Beta Was this translation helpful? Give feedback.
-
|
The issue has now been resolved by updating to the latest versions of Oceananigans, MPI, and CUDA. Thanks a lot for your help! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
I’m running into an issue with a discrete boundary condition that accesses model_fields, which works correctly on a single GPU, but fails to compile when using distributed GPUs.
Any suggestion would be greatly appreciated.
Thanks a lot!
Environment (NeSI HPC)
SLURM script submitted on NeSI
Minimal Working Example
Error
Beta Was this translation helpful? Give feedback.
All reactions