diff --git a/sycl/doc/design/GlobalObjectsInRuntime.md b/sycl/doc/design/GlobalObjectsInRuntime.md index b56dd7767d108..d3c66a4f6ec0e 100644 --- a/sycl/doc/design/GlobalObjectsInRuntime.md +++ b/sycl/doc/design/GlobalObjectsInRuntime.md @@ -55,9 +55,65 @@ Deinitialization is platform-specific. Upon application shutdown, the DPC++ runtime frees memory pointed by `GlobalHandler` global pointer, which triggers destruction of nested `std::unique_ptr`s. +### Shutdown Tasks and Challenges + +As the user's app ends, SYCL's primary goal is to release any UR adapters that +have been gotten, and teardown the plugins/adapters themselves. Additionally, +we need to stop deferring any new buffer releases and clean up any memory +whose release was deferred. + +To this end, the shutdown occurs in two phases: early and late. The purpose +for early shutdown is primarily to stop any further deferring of memory release. +This is because the deferred memory release is based on threads and on Windows +the threads will be abandoned. So as soon as possible we want to stop deferring +memory and try to let go any that has been deferred. The purpose for late +shutdown is to hold onto the handles and adapters longer than the user's +application. We don't want to initiate late shutdown until after all the users +static and thread local vars have been destroyed, in case those destructors are +calling SYCL. + +In the early shutdown we stop deferring, tell the scheduler to prepare for release, and +try releasing the memory that has been deferred so far. Following this, if +the user has any global or static handles to sycl objects, they'll be destroyed. +Finally, the late shutdown routine is called the last of the UR handles and +adapters are let go, as is the GlobalHandler itself. + + +#### Threads +The deferred memory marshalling is built on a thread pool, but there is a +challenge here in that on Windows, once the end of the users main() is reached +and their app is shutting down, the Windows OS will abandon all remaining +in-flight threads. These threads can be .join() but they simply return instantly, +the threads are not completed. Further any thread specific variables +(or thread_local static vars) will NOT have their destructors called. Note +that the standard while-loop-over-condition-var pattern will cause a hang - +we cannot "wait" on abandoned threads. +On Windows, short of adding some user called API to signal this, there is +no way to detect or avoid this. None of the "end-of-library" lifecycle events +occurs before the threads are abandoned. ( not std::atexit(), not globals or +static, or static thread_local var destruction, not DllMain(DLL_PROCESS_DETACH) ) +This means that on Windows, once we arrive at shutdown_early we cannot wait on +host events or the thread pool. + +For the deferred memory itself, there is no issue here. The Windows OS will +reclaim the memory for us. The issue of which we must be wary is placing UR +handles (and similar) in host threads. The RAII mechanism of unique and +shared pointers will not work in any thread that is abandoned on Windows. + +One last note about threads. It is entirely the OS's discretion when to +start or schedule a thread. If the main process is very busy then it is +possible that threads the SYCL library creates (host_tasks/thread_pool) +won't even be started until AFTER the host application main() function is done. +This is not a normal occurrence, but it can happen if there is no call to queue.wait() + + ### Linux -On Linux DPC++ runtime uses `__attribute__((destructor))` property with low +On Linux, the "early_shutdown()" is begun by the destruction of a static +StaticVarShutdownHandler object, which is initialized by +platform::get_platforms(). + +late_shutdown() timing uses `__attribute__((destructor))` property with low priority value 110. This approach does not guarantee, that `GlobalHandler` destructor is the last thing to run, as user code may contain a similar function with the same priority value. At the same time, users may specify priorities @@ -72,10 +128,14 @@ times, the memory leak may impact code performance. ### Windows -To identify shutdown moment on Windows, DPC++ runtime uses default `DllMain` -function with `DLL_PROCESS_DETACH` reason. This guarantees, that global objects -deinitialization happens right before `sycl.dll` is unloaded from process -address space. +Differing from Linux, on Windows the "early_shutdown()" is begun by +DllMain(PROCESS_DETACH), unless statically linked. + +The "late_shutdown()" is begun by the destruction of a +static StaticVarShutdownHandler object, which is initialized by +platform::get_platforms(). ( On linux, this is when we do "early_shutdown()". +Go figure.) This is as late as we can manage, but it is later than any user +application global, static, or thread_local variable destruction. ### Recommendations for DPC++ runtime developers @@ -109,8 +169,7 @@ for (adapter in initializedAdapters) { urLoaderTearDown(); ``` -Which in turn is called by either `shutdown_late()` or `shutdown_win()` -depending on platform. +Which in turn is called by `shutdown_late()`. ![](images/adapter-lifetime.jpg) diff --git a/sycl/source/detail/global_handler.cpp b/sycl/source/detail/global_handler.cpp index 078da1aab83ba..f1cfe7af2cfca 100644 --- a/sycl/source/detail/global_handler.cpp +++ b/sycl/source/detail/global_handler.cpp @@ -37,9 +37,11 @@ using LockGuard = std::lock_guard; SpinLock GlobalHandler::MSyclGlobalHandlerProtector{}; // forward decl -void shutdown_win(); // TODO: win variant will go away soon void shutdown_early(); void shutdown_late(); +#ifdef _WIN32 +BOOL isLinkedStatically(); +#endif // Utility class to track references on object. // Used for GlobalHandler now and created as thread_local object on the first @@ -60,10 +62,6 @@ class ObjectUsageCounter { LockGuard Guard(GlobalHandler::MSyclGlobalHandlerProtector); MCounter--; - GlobalHandler *RTGlobalObjHandler = GlobalHandler::getInstancePtr(); - if (RTGlobalObjHandler) { - RTGlobalObjHandler->prepareSchedulerToRelease(!MCounter); - } } catch (std::exception &e) { __SYCL_REPORT_EXCEPTION_TO_STREAM("exception in ~ObjectUsageCounter", e); } @@ -254,24 +252,37 @@ void GlobalHandler::releaseDefaultContexts() { MPlatformToDefaultContextCache.Inst.reset(nullptr); } -struct EarlyShutdownHandler { - ~EarlyShutdownHandler() { +// Shutdown is split into two parts. shutdown_early() stops any more +// objects from being deferred and takes an initial pass at freeing them. +// shutdown_late() finishes and releases the adapters and the GlobalHandler. +// For Windows, early shutdown is typically called from DllMain, +// and late shutdown is here. +// For Linux, early shutdown is here, and late shutdown is called from +// a low priority destructor. +struct StaticVarShutdownHandler { + + ~StaticVarShutdownHandler() { try { #ifdef _WIN32 - // on Windows we keep to the existing shutdown procedure - GlobalHandler::instance().releaseDefaultContexts(); + // If statically linked, DllMain will not be called. So we do its work + // here. + if (isLinkedStatically()) { + shutdown_early(); + } + + shutdown_late(); #else shutdown_early(); #endif } catch (std::exception &e) { - __SYCL_REPORT_EXCEPTION_TO_STREAM("exception in ~EarlyShutdownHandler", - e); + __SYCL_REPORT_EXCEPTION_TO_STREAM( + "exception in ~StaticVarShutdownHandler", e); } } }; -void GlobalHandler::registerEarlyShutdownHandler() { - static EarlyShutdownHandler handler{}; +void GlobalHandler::registerStaticVarShutdownHandler() { + static StaticVarShutdownHandler handler{}; } bool GlobalHandler::isOkToDefer() const { return OkToDefer; } @@ -304,10 +315,10 @@ void GlobalHandler::prepareSchedulerToRelease(bool Blocking) { #ifndef _WIN32 if (Blocking) drainThreadPool(); +#endif if (MScheduler.Inst) MScheduler.Inst->releaseResources(Blocking ? BlockingT::BLOCKING : BlockingT::NON_BLOCKING); -#endif } void GlobalHandler::drainThreadPool() { @@ -315,24 +326,18 @@ void GlobalHandler::drainThreadPool() { MHostTaskThreadPool.Inst->drain(); } -#ifdef _WIN32 -// because of something not-yet-understood on Windows -// threads may be shutdown once the end of main() is reached -// making an orderly shutdown difficult. Fortunately, Windows -// itself is very aggressive about reclaiming memory. Thus, -// we focus solely on unloading the adapters, so as to not -// accidentally retain device handles. etc -void shutdown_win() { - GlobalHandler *&Handler = GlobalHandler::getInstancePtr(); - Handler->unloadAdapters(); -} -#else void shutdown_early() { const LockGuard Lock{GlobalHandler::MSyclGlobalHandlerProtector}; GlobalHandler *&Handler = GlobalHandler::getInstancePtr(); if (!Handler) return; +#if defined(XPTI_ENABLE_INSTRUMENTATION) && defined(_WIN32) + if (xptiTraceEnabled()) + return; // When doing xpti tracing, we can't safely shutdown on Win. + // TODO: figure out why XPTI prevents release. +#endif + // Now that we are shutting down, we will no longer defer MemObj releases. Handler->endDeferredRelease(); @@ -354,6 +359,12 @@ void shutdown_late() { if (!Handler) return; +#if defined(XPTI_ENABLE_INSTRUMENTATION) && defined(_WIN32) + if (xptiTraceEnabled()) + return; // When doing xpti tracing, we can't safely shutdown on Win. + // TODO: figure out why XPTI prevents release. +#endif + // First, release resources, that may access adapters. Handler->MPlatformCache.Inst.reset(nullptr); Handler->MScheduler.Inst.reset(nullptr); @@ -370,7 +381,6 @@ void shutdown_late() { delete Handler; Handler = nullptr; } -#endif #ifdef _WIN32 extern "C" __SYCL_EXPORT BOOL WINAPI DllMain(HINSTANCE hinstDLL, @@ -391,23 +401,18 @@ extern "C" __SYCL_EXPORT BOOL WINAPI DllMain(HINSTANCE hinstDLL, if (PrintUrTrace) std::cout << "---> DLL_PROCESS_DETACH syclx.dll\n" << std::endl; -#ifdef XPTI_ENABLE_INSTRUMENTATION - if (xptiTraceEnabled()) - return TRUE; // When doing xpti tracing, we can't safely call shutdown. - // TODO: figure out what XPTI is doing that prevents - // release. -#endif - try { - shutdown_win(); + shutdown_early(); } catch (std::exception &e) { - __SYCL_REPORT_EXCEPTION_TO_STREAM("exception in shutdown_win", e); + __SYCL_REPORT_EXCEPTION_TO_STREAM("exception in DLL_PROCESS_DETACH", e); return FALSE; } + break; case DLL_PROCESS_ATTACH: if (PrintUrTrace) std::cout << "---> DLL_PROCESS_ATTACH syclx.dll\n" << std::endl; + break; case DLL_THREAD_ATTACH: break; @@ -416,6 +421,29 @@ extern "C" __SYCL_EXPORT BOOL WINAPI DllMain(HINSTANCE hinstDLL, } return TRUE; // Successful DLL_PROCESS_ATTACH. } +BOOL isLinkedStatically() { + // If the exePath is the same as the dllPath, + // or if the module handle for DllMain is not retrievable, + // then we are linked statically + // Otherwise we are dynamically linked or loaded. + HMODULE hModule = nullptr; + auto LpModuleAddr = reinterpret_cast(&DllMain); + if (!GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS, LpModuleAddr, + &hModule)) { + return true; // not retrievable, therefore statically linked + } else { + char dllPath[MAX_PATH]; + if (GetModuleFileNameA(hModule, dllPath, MAX_PATH)) { + char exePath[MAX_PATH]; + if (GetModuleFileNameA(NULL, exePath, MAX_PATH)) { + if (std::string(dllPath) == std::string(exePath)) { + return true; // paths identical, therefore statically linked + } + } + } + } + return false; // Otherwise dynamically linked or loaded +} #else // Setting low priority on destructor ensures it runs after all other global // destructors. Priorities 0-100 are reserved by the compiler. The priority diff --git a/sycl/source/detail/global_handler.hpp b/sycl/source/detail/global_handler.hpp index ef5bc32c0db0d..1906e4c055d8e 100644 --- a/sycl/source/detail/global_handler.hpp +++ b/sycl/source/detail/global_handler.hpp @@ -73,7 +73,7 @@ class GlobalHandler { XPTIRegistry &getXPTIRegistry(); ThreadPool &getHostTaskThreadPool(); - static void registerEarlyShutdownHandler(); + static void registerStaticVarShutdownHandler(); bool isOkToDefer() const; void endDeferredRelease(); @@ -95,7 +95,6 @@ class GlobalHandler { bool OkToDefer = true; - friend void shutdown_win(); friend void shutdown_early(); friend void shutdown_late(); friend class ObjectUsageCounter; diff --git a/sycl/source/detail/host_task.hpp b/sycl/source/detail/host_task.hpp index 5f7ae11c6a0e4..f7e3feff8d0ef 100644 --- a/sycl/source/detail/host_task.hpp +++ b/sycl/source/detail/host_task.hpp @@ -13,6 +13,7 @@ #pragma once #include +#include #include #include #include @@ -33,6 +34,10 @@ class HostTask { bool isInteropTask() const { return !!MInteropTask; } void call(HostProfilingInfo *HPI) { + if (!GlobalHandler::instance().isOkToDefer()) { + return; + } + if (HPI) HPI->start(); MHostTask(); @@ -41,6 +46,10 @@ class HostTask { } void call(HostProfilingInfo *HPI, interop_handle handle) { + if (!GlobalHandler::instance().isOkToDefer()) { + return; + } + if (HPI) HPI->start(); MInteropTask(handle); diff --git a/sycl/source/detail/platform_impl.cpp b/sycl/source/detail/platform_impl.cpp index b27b95c1f1938..2d454bb09e7c2 100644 --- a/sycl/source/detail/platform_impl.cpp +++ b/sycl/source/detail/platform_impl.cpp @@ -166,7 +166,7 @@ std::vector platform_impl::get_platforms() { // This initializes a function-local variable whose destructor is invoked as // the SYCL shared library is first being unloaded. - GlobalHandler::registerEarlyShutdownHandler(); + GlobalHandler::registerStaticVarShutdownHandler(); return Platforms; } diff --git a/sycl/source/detail/program_manager/program_manager.cpp b/sycl/source/detail/program_manager/program_manager.cpp index 88ef821e7ad70..3bbd59169526e 100644 --- a/sycl/source/detail/program_manager/program_manager.cpp +++ b/sycl/source/detail/program_manager/program_manager.cpp @@ -3846,7 +3846,9 @@ extern "C" void __sycl_register_lib(sycl_device_binaries desc) { // Executed as a part of current module's (.exe, .dll) static initialization extern "C" void __sycl_unregister_lib(sycl_device_binaries desc) { // Partial cleanup is not necessary at shutdown +#ifndef _WIN32 if (!sycl::detail::GlobalHandler::instance().isOkToDefer()) return; sycl::detail::ProgramManager::getInstance().removeImages(desc); +#endif } diff --git a/sycl/source/detail/queue_impl.hpp b/sycl/source/detail/queue_impl.hpp index 5db9d7e75305e..3dda89e58a154 100644 --- a/sycl/source/detail/queue_impl.hpp +++ b/sycl/source/detail/queue_impl.hpp @@ -264,7 +264,17 @@ class queue_impl { destructorNotification(); #endif throw_asynchronous(); - getAdapter()->call(MQueue); + auto status = + getAdapter()->call_nocheck(MQueue); + // If loader is already closed, it'll return a not-initialized status + // which the UR should convert to SUCCESS code. But that isn't always + // working on Windows. This is a temporary workaround until that is fixed. + // TODO: Remove this workaround when UR is fixed, and restore + // ->call<>() instead of ->call_nocheck<>() above. + if (status != UR_RESULT_SUCCESS && + status != UR_RESULT_ERROR_UNINITIALIZED) { + __SYCL_CHECK_UR_CODE_NO_EXC(status); + } } catch (std::exception &e) { __SYCL_REPORT_EXCEPTION_TO_STREAM("exception in ~queue_impl", e); } diff --git a/sycl/source/detail/scheduler/scheduler.cpp b/sycl/source/detail/scheduler/scheduler.cpp index eb52b490efd71..17b972e50da30 100644 --- a/sycl/source/detail/scheduler/scheduler.cpp +++ b/sycl/source/detail/scheduler/scheduler.cpp @@ -279,7 +279,20 @@ bool Scheduler::removeMemoryObject(detail::SYCLMemObjI *MemObj, // No operations were performed on the mem object return true; - { +#ifdef _WIN32 + // If we are shutting down on Windows it may not be + // safe to wait on host threads, as the OS may + // abandon them. But no worries, the memory WILL be reclaimed. + bool allowWait = + MemObj->hasUserDataPtr() || GlobalHandler::instance().isOkToDefer(); + if (!allowWait) { + StrictLock = false; + } +#else + bool allowWait = true; +#endif + + if (allowWait) { // This only needs a shared mutex as it only involves enqueueing and // awaiting for events ReadLockT Lock = StrictLock ? ReadLockT(MGraphLock) @@ -289,10 +302,20 @@ bool Scheduler::removeMemoryObject(detail::SYCLMemObjI *MemObj, waitForRecordToFinish(Record, Lock); } { + // If allowWait is false, it means the application is shutting down. + // On Windows we can't safely wait on threads, because they have likely been + // abandoned. So we will try to get the lock. If we can, great, we'll remove + // the record. But if we can't, we just skip. The OS will reclaim the + // memory. WriteLockT Lock = StrictLock ? acquireWriteLock() : WriteLockT(MGraphLock, std::try_to_lock); - if (!Lock.owns_lock()) - return false; + if (!Lock.owns_lock()) { + + if (allowWait) + return false; // Record was not removed, the caller may try again. + else + return true; // skip. + } MGraphBuilder.decrementLeafCountersForRecord(Record); MGraphBuilder.cleanupCommandsForRecord(Record); MGraphBuilder.removeRecordForMemObj(MemObj); @@ -575,11 +598,10 @@ void Scheduler::cleanupAuxiliaryResources(BlockingT Blocking) { std::unique_lock Lock{MAuxiliaryResourcesMutex}; for (auto It = MAuxiliaryResources.begin(); It != MAuxiliaryResources.end();) { - const EventImplPtr &Event = It->first; if (Blocking == BlockingT::BLOCKING) { - Event->waitInternal(); + It->first->waitInternal(); It = MAuxiliaryResources.erase(It); - } else if (Event->isCompleted()) + } else if (It->first->isCompleted()) It = MAuxiliaryResources.erase(It); else ++It; diff --git a/sycl/source/detail/thread_pool.hpp b/sycl/source/detail/thread_pool.hpp index 50240e0a98b06..e9d441d6d27d1 100644 --- a/sycl/source/detail/thread_pool.hpp +++ b/sycl/source/detail/thread_pool.hpp @@ -75,7 +75,9 @@ class ThreadPool { ~ThreadPool() { try { +#ifndef _WIN32 finishAndWait(); +#endif } catch (std::exception &e) { __SYCL_REPORT_EXCEPTION_TO_STREAM("exception in ~ThreadPool", e); } diff --git a/sycl/test-e2e/DeprecatedFeatures/queue_old_interop.cpp b/sycl/test-e2e/DeprecatedFeatures/queue_old_interop.cpp index ab9059ce98976..a051225a6af20 100644 --- a/sycl/test-e2e/DeprecatedFeatures/queue_old_interop.cpp +++ b/sycl/test-e2e/DeprecatedFeatures/queue_old_interop.cpp @@ -1,5 +1,5 @@ // RUN: %{build} -D__SYCL_INTERNAL_API -o %t.out -// RUN: %{run-unfiltered-devices} %t.out +// RUN: %{run} %t.out //==-------- queue_old_interop.cpp - SYCL queue OpenCL interop test --------==// // diff --git a/sycl/test-e2e/Regression/static-buffer-dtor.cpp b/sycl/test-e2e/Regression/static-buffer-dtor.cpp index 474c2b756d3fe..0226487918253 100644 --- a/sycl/test-e2e/Regression/static-buffer-dtor.cpp +++ b/sycl/test-e2e/Regression/static-buffer-dtor.cpp @@ -24,16 +24,28 @@ #include int main() { - uint8_t *h_A = (uint8_t *)malloc(256); + static sycl::buffer bufs[2] = {sycl::range<1>(256), sycl::range<1>(256)}; sycl::queue q; q.submit([&](sycl::handler &cgh) { - cgh.copy(h_A, bufs[0].get_access(cgh)); + auto acc = bufs[0].get_access(cgh); + cgh.single_task([=]() { + for (int i = 0; i < 256; i++) { + acc[i] = 24; + } + }); }); + q.submit([&](sycl::handler &cgh) { - cgh.copy(h_A, bufs[1].get_access(cgh)); + auto acc = bufs[1].get_access(cgh); + cgh.single_task([=]() { + for (int i = 0; i < 256; i++) { + acc[i] = 25; + } + }); }); - free(h_A); + + // no q.wait() return 0; } diff --git a/sycl/unittests/CMakeLists.txt b/sycl/unittests/CMakeLists.txt index aaad18cbde4f5..c08e96ea5330a 100644 --- a/sycl/unittests/CMakeLists.txt +++ b/sycl/unittests/CMakeLists.txt @@ -43,7 +43,6 @@ add_subdirectory(pipes) add_subdirectory(program_manager) add_subdirectory(assert) add_subdirectory(Extensions) -add_subdirectory(windows) add_subdirectory(event) add_subdirectory(buffer) add_subdirectory(context_device) diff --git a/sycl/unittests/buffer/BufferReleaseBase.hpp b/sycl/unittests/buffer/BufferReleaseBase.hpp index a4982af3b581f..322b85ffe469c 100644 --- a/sycl/unittests/buffer/BufferReleaseBase.hpp +++ b/sycl/unittests/buffer/BufferReleaseBase.hpp @@ -16,7 +16,6 @@ #include #include -#include #include #include diff --git a/sycl/unittests/windows/CMakeLists.txt b/sycl/unittests/windows/CMakeLists.txt deleted file mode 100644 index 6143d5de55045..0000000000000 --- a/sycl/unittests/windows/CMakeLists.txt +++ /dev/null @@ -1,4 +0,0 @@ -add_sycl_unittest(WindowsDllMainTest OBJECT - dllmain.cpp -) - diff --git a/sycl/unittests/windows/dllmain.cpp b/sycl/unittests/windows/dllmain.cpp deleted file mode 100644 index 79c41981f426b..0000000000000 --- a/sycl/unittests/windows/dllmain.cpp +++ /dev/null @@ -1,62 +0,0 @@ -//==----- dllmain.cpp --- verify behaviour of lib on process termination ---==// -// -// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. -// See https://llvm.org/LICENSE.txt for license information. -// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception -// -//===----------------------------------------------------------------------===// - -/* - * This test calls DllMain on Windows. This means, the process performs actions - * which are required for library unload. That said, the test requires to be a - * distinct binary executable. - */ - -#include -#include - -#include - -#ifdef _WIN32 -#include - -extern "C" BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, - LPVOID lpReserved); - -static std::atomic TearDownCalls{0}; - -// Before the port this was an override for LoaderTearDown, UR's mock -// functionality can't override loader functions but AdapterRelease is called -// in the runtime in the same place as LoaderTearDown -ur_result_t redefinedAdapterRelease(void *) { - fprintf(stderr, "intercepted tear down\n"); - ++TearDownCalls; - - return UR_RESULT_SUCCESS; -} -#endif - -TEST(Windows, DllMainCall) { -#ifdef _WIN32 - sycl::unittest::UrMock<> Mock; - sycl::platform Plt = sycl::platform(); - mock::getCallbacks().set_before_callback("urAdapterRelease", - &redefinedAdapterRelease); - - // Teardown calls are only expected on sycl.dll library unload, not when - // process gets terminated. - // The first call to DllMain is to simulate library unload. The second one - // is to simulate process termination - fprintf(stderr, "Call DllMain for the first time\n"); - DllMain((HINSTANCE)0, DLL_PROCESS_DETACH, (LPVOID)NULL); - - int TearDownCallsDone = TearDownCalls.load(); - - EXPECT_NE(TearDownCallsDone, 0); - - fprintf(stderr, "Call DllMain for the second time\n"); - DllMain((HINSTANCE)0, DLL_PROCESS_DETACH, (LPVOID)0x01); - - EXPECT_EQ(TearDownCalls.load(), TearDownCallsDone); -#endif -} diff --git a/unified-runtime/source/adapters/level_zero/event.cpp b/unified-runtime/source/adapters/level_zero/event.cpp index f23cab32c5a5b..fd1883870e0d4 100644 --- a/unified-runtime/source/adapters/level_zero/event.cpp +++ b/unified-runtime/source/adapters/level_zero/event.cpp @@ -496,7 +496,7 @@ ur_result_t urEventGetInfo( auto HostVisibleEvent = Event->HostVisibleEvent; if (Event->Completed) { Result = UR_EVENT_STATUS_COMPLETE; - } else if (HostVisibleEvent) { + } else if (HostVisibleEvent && checkL0LoaderTeardown()) { ze_result_t ZeResult; ZeResult = ZE_CALL_NOCHECK(zeEventQueryStatus, (HostVisibleEvent->ZeEvent)); diff --git a/unified-runtime/source/adapters/level_zero/queue.cpp b/unified-runtime/source/adapters/level_zero/queue.cpp index e0534af478ae7..da68a331a353b 100644 --- a/unified-runtime/source/adapters/level_zero/queue.cpp +++ b/unified-runtime/source/adapters/level_zero/queue.cpp @@ -599,6 +599,9 @@ ur_result_t urQueueRetain( ur_result_t urQueueRelease( /// [in] handle of the queue object to release ur_queue_handle_t Queue) { + if (!checkL0LoaderTeardown()) + return UR_RESULT_SUCCESS; + std::vector EventListToCleanup; { std::scoped_lock Lock(Queue->Mutex); diff --git a/unified-runtime/source/adapters/opencl/event.cpp b/unified-runtime/source/adapters/opencl/event.cpp index 1717d75dfceda..dc017ee3947f2 100644 --- a/unified-runtime/source/adapters/opencl/event.cpp +++ b/unified-runtime/source/adapters/opencl/event.cpp @@ -181,7 +181,10 @@ UR_APIEXPORT ur_result_t UR_APICALL urEventGetInfo(ur_event_handle_t hEvent, size_t CheckPropSize = 0; cl_int RetErr = clGetEventInfo(hEvent->CLEvent, CLEventInfo, propSize, pPropValue, &CheckPropSize); - if (pPropValue && CheckPropSize != propSize) { + if (pPropValue && CheckPropSize != propSize && + propName != UR_EVENT_INFO_COMMAND_EXECUTION_STATUS) { + // Opencl:cpu may (incorrectly) return 0 for propSize when checking + // execution status when status is CL_COMPLETE. return UR_RESULT_ERROR_INVALID_SIZE; } CL_RETURN_ON_FAILURE(RetErr);