-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory problems with CUDA-based rings #138
Comments
Bifrost asynchronicity is based around CPU threads each having their own CUDA stream. All GPU work in a CPU thread must be synchronous with respect to that thread, so it must be followed by a stream synchronize before things are released to other threads. (Using async CUDA APIs and then synchronizing on a per-CPU-thread stream ensures that GPU work is synchronous within the CPU thread but asynchronous between threads). E.g., the pipeline infrastructure does this for all blocks here: |
Ok, thanks. |
Dummysource replaces the ethernet input for throughput testing, and is enabled with the commandline switch --fakesource Add xGPU averaging and subselection. The former has been "tested" in that it outputs appropriate data when the pipeline is fed with the all ones. With all threads active, the pipeline runs at ~40Gb/s on my old Xeon machine, seemingly processing limited by my RTX 2060 GPU. NB: Probably some syncronization barriers are needed, certainly on the block which copies data to the GPU. See ledatelescope/bifrost#138
When blocks blocks write to a ring across the CPU/GPU boundary this copy is [I think] asynchronous, and needs to be synchronized before marking the destination buffer as ready for consumption by downstream consumers. See ledatelescope/bifrost#138
A couple of times now I have run into problems passing data between blocks using CUDA-based rings. If I don't force a
bifrost.device.synchronize_stream()
within the reserve context for the ring, I end up with inconsistent results reading from the ring in another block. I think what is happening is that the ring doesn't know about the asynchronous copies and happily marks the reserved segment as good to go when then reserve is released. Is there a better way to deal with this than sprinklingsynchronize_stream()
calls around?The text was updated successfully, but these errors were encountered: