-
Describe the bug I might be missing something really obvious, or it might have to do with the systemd unit, but wanted to make sure no one knew why this was happening here. Steps To Reproduce
Expected behavior System Information
Additional context Nothing shows up in the stderrgpudetect file. |
Beta Was this translation helpful? Give feedback.
Replies: 18 comments 1 reply
-
If you run But the client is supposed to keep going even if the GPU detection subprocess fails. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Try |
Beta Was this translation helpful? Give feedback.
-
Just so I'm not exiting prematurely, how long should this take, maximum? |
Beta Was this translation helpful? Give feedback.
-
“Not that long”, probably… (I don’t know the real answer) But if the detection hangs, the client will also hang because it waits forever for the detection to complete. A timeout here would clearly be a good idea. |
Beta Was this translation helpful? Give feedback.
-
OK, it's just hanging then. Both running as boinc user and as root user. I never get that coproc_info.xml file that @davidpanderson was looking for. I have added the boinc user to the video group, but maybe it's a permissions issue separate from that? Not sure. |
Beta Was this translation helpful? Give feedback.
-
Not sure what else to suggest. You could try the forum; maybe somebody there has some ideas. During detection, warning messages are stored for reporting back to the client on completion. It would be useful if those also got written to the detector process’s Also, in case anybody’s wondering: |
Beta Was this translation helpful? Give feedback.
-
Do you have these packages installed?
- boinc-client-nvidia-cuda
- boinc-client-opencl
- mesa-opencl-icd
- ocl-icd-libopencl1
?
Пт, 7 апр. 2023 г. в 19:52, Eli T. Drumm ***@***.***>:
OK, it's just hanging then. Both running as boinc user and as root user. I
never get that coproc_info.xml file that @davidpanderson
<https://github.com/davidpanderson> was looking for.
I have added the boinc user to the video group, but maybe it's a
permissions issue separate from that? Not sure.
—
Reply to this email directly, view it on GitHub
<#5183 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYVTIIERLUN6D6KGT75EBDXABH5TANCNFSM6AAAAAAWV7TISM>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Best regards,
Vitalii Koshura
Sent via iPhone
|
Beta Was this translation helpful? Give feedback.
-
boinc --detect_gpus shouldn't hang even if expected libraries are missing. I'll implement Brian's idea of writing warnings to stderr; that may shed some light. |
Beta Was this translation helpful? Give feedback.
-
And/or some tracepoints to track progress through the code. It’s not as though there’s any need to keep the noise down in that file… |
Beta Was this translation helpful? Give feedback.
-
I added the stderr writes in a 'dpa_gpu_detect' branch. |
Beta Was this translation helpful? Give feedback.
-
BTW it should take << 1 sec to complete |
Beta Was this translation helpful? Give feedback.
-
@AenBleidd So I'm on Arch, which is probably not officially supported, but I have the equivalents of those libraries installed according to the Arch Wiki. @davidpanderson I can try to clone that branch, build, and report back |
Beta Was this translation helpful? Give feedback.
-
OK, so with that enabled (plus a few "test" fprintfs from me) here's what I get: So I guess the error is happening sometime after the Update: it's something in the opencl detection. I'll keep digging. Update: It's a problem with this line of code: ciErrNum = (*p_clGetPlatformIDs)(MAX_OPENCL_PLATFORMS, platforms, &num_platforms); Does this mean I need to get in touch with the OpenCL people? ChatGPT tells me this is calling the |
Beta Was this translation helpful? Give feedback.
-
OK well by following the instructions in this thread (uninstalling some intel stuff) I was able to get it to work. I guess the problem is with OneAPI. Sorry about all this, but thank you for the help! |
Beta Was this translation helpful? Give feedback.
-
@etdr, thank you for testing this and sharing the solution. |
Beta Was this translation helpful? Give feedback.
-
The root cause might lie elsewhere, but to the user it looks like a bug in BOINC. The client could handle this situation better (with both improved diagnostics and a timeout in waiting for the detection to complete). |
Beta Was this translation helpful? Give feedback.
-
Probably the easiest thing would be that if the GPU detection process hasn't exited in 10 seconds, |
Beta Was this translation helpful? Give feedback.
OK well by following the instructions in this thread (uninstalling some intel stuff) I was able to get it to work. I guess the problem is with OneAPI. Sorry about all this, but thank you for the help!