Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DSE] Mark promise of pre-split coroutine visible to caller #133918

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

NewSigma
Copy link
Contributor

@NewSigma NewSigma commented Apr 1, 2025

Currently DSE does not recognize that the coro frame is visible to the caller. It incorrectly eliminates stores right before returning to the caller, even though these values will be used upon resumption. This commit marks promise of pre-split coroutine visible to caller to avoid incorrect elimination.

Fix #123347

Allocas are destroyed when returning from functions. However, this is not the case for pre-split coroutines. Any premature elimination will lead to side effects.

Fix 123347
@NewSigma
Copy link
Contributor Author

NewSigma commented Apr 2, 2025

Request code review from @nikic and @ChuanqiXu9

@hstk30-hw hstk30-hw requested review from nikic and ChuanqiXu9 April 2, 2025 12:01
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some more detail to the issue description why coroutine semantics require this? This hack looks very problematic to me, and seems quite distinct from the existing coroutine workarounds we have (which are about the possibility of the thread identity changing across suspension points).

@@ -0,0 +1,41 @@
; Test that store-load operation that crosses suspension point will not be eliminated by DSE before CoroSplit
; RUN: opt < %s -passes='dse,verify' -S | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; RUN: opt < %s -passes='dse,verify' -S | FileCheck %s
; RUN: opt < %s -passes=dse -S | FileCheck %s

verify is implicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted

declare void @llvm.lifetime.end.p0(i64, ptr)
declare void @llvm.coro.await.suspend.void(ptr, ptr, ptr)
declare i8 @llvm.coro.suspend(token, i1)
declare i1 @llvm.coro.end(ptr, i1, token)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to declare coro intrinsics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted

; Test that store-load operation that crosses suspension point will not be eliminated by DSE before CoroSplit
; RUN: opt < %s -passes='dse,verify' -S | FileCheck %s

define void @fn(ptr align 8 %0) presplitcoroutine {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please use named values in the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (!isInvisibleToCallerOnUnwind(V)) {
I.first->second = false;
} else if (isNoAliasCall(V)) {
if (isInvisibleToCallerOnUnwind(V) && isNoAliasCall(V))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part of the change just an NFC refactor? If so, can you please directly land it separately? It's confusing to have it as part of this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will land it separately.

@llvmbot
Copy link
Member

llvmbot commented Apr 3, 2025

@llvm/pr-subscribers-llvm-transforms

Author: None (NewSigma)

Changes

Allocas are destroyed when returning from functions. However, this is not the case for pre-split coroutines, because coroutine frame should be visible to caller. For example, one can write to the coroutine's promise, suspend, and later read from the caller. Eliminating such stores would introduce side effects.

This commit forces that all allocas of pre-split coroutines remain visible to the caller. While this may miss some optimization opportunities, correctness takes priority. Future work could analyze the lifetimes of allocas if performance regressions become significant.

Fix #123347


Full diff: https://github.com/llvm/llvm-project/pull/133918.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp (+3-1)
  • (added) llvm/test/Transforms/DeadStoreElimination/coro-alloca.ll (+33)
diff --git a/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp b/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
index 935f21fd484f3..780b64e70136f 100644
--- a/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
+++ b/llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
@@ -1194,7 +1194,9 @@ struct DSEState {
 
   bool isInvisibleToCallerAfterRet(const Value *V) {
     if (isa<AllocaInst>(V))
-      return true;
+      // Defer alloca store elimination, wait for CoroSplit
+      return !F.isPresplitCoroutine();
+
     auto I = InvisibleToCallerAfterRet.insert({V, false});
     if (I.second) {
       if (!isInvisibleToCallerOnUnwind(V)) {
diff --git a/llvm/test/Transforms/DeadStoreElimination/coro-alloca.ll b/llvm/test/Transforms/DeadStoreElimination/coro-alloca.ll
new file mode 100644
index 0000000000000..ec9dc84f2c4ae
--- /dev/null
+++ b/llvm/test/Transforms/DeadStoreElimination/coro-alloca.ll
@@ -0,0 +1,33 @@
+; Test that store-load operation that crosses suspension point will not be eliminated by DSE before CoroSplit
+; RUN: opt < %s -passes='dse' -S | FileCheck %s
+
+define void @fn(ptr align 8 %arg) presplitcoroutine {
+  %promise = alloca ptr, align 8
+  %awaiter = alloca i8, align 1
+  %id = call token @llvm.coro.id(i32 16, ptr %promise, ptr @fn, ptr null)
+  %hdl = call ptr @llvm.coro.begin(token %id, ptr null)
+  %mem = call ptr @malloc(i64 1)
+  call void @llvm.lifetime.start.p0(i64 8, ptr %promise)
+  store ptr %mem, ptr %promise, align 8
+  %save = call token @llvm.coro.save(ptr null)
+  call void @llvm.coro.await.suspend.void(ptr %awaiter, ptr %hdl, ptr @await_suspend_wrapper_void)
+  %sp = call i8 @llvm.coro.suspend(token %save, i1 false)
+  %flag = icmp ule i8 %sp, 1
+  br i1 %flag, label %resume, label %suspend
+
+resume:
+  call void @llvm.lifetime.end.p0(i64 8, ptr %promise)
+  br label %suspend
+
+suspend:
+  call i1 @llvm.coro.end(ptr null, i1 false, token none)
+  %temp = load ptr, ptr %promise, align 8
+  store ptr %temp, ptr %arg, align 8
+; store when suspend, load when resume
+; CHECK: store ptr null, ptr %promise, align 8
+  store ptr null, ptr %promise, align 8
+  ret void
+}
+
+declare ptr @malloc(i64)
+declare void @await_suspend_wrapper_void(ptr, ptr)

@NewSigma
Copy link
Contributor Author

NewSigma commented Apr 3, 2025

Thanks. I updated my issue description.

@ChuanqiXu9
Copy link
Member

Allocas are destroyed when returning from functions. However, this is not the case for pre-split coroutines, because coroutine frame should be visible to caller. For example, one can write to the coroutine's promise, suspend, and later read from the caller. Eliminating such stores would introduce side effects.

Could you elaborate this? e.g, give a example to describe why it is problematic. I didn't understand it.

This commit forces that all allocas of pre-split coroutines remain visible to the caller. While this may miss some optimization

While I didn't understand the problem, my instinct reaction is, even if we want to do something like this, maybe we can only do this for special allocas, like the promise alloca.

Copy link

github-actions bot commented Apr 3, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@NewSigma
Copy link
Contributor Author

NewSigma commented Apr 3, 2025

Could you elaborate this? e.g, give a example to describe why it is problematic. I didn't understand it.

It seems that DSE does not recognize that the coro frame is visible to the caller. It incorrectly eliminates stores right before returning to the caller, even though these values will be used upon resumption."

While I didn't understand the problem, my instinct reaction is, even if we want to do something like this, maybe we can only do this for special allocas, like the promise alloca.

Yes, this is a safer choice.

@NewSigma NewSigma changed the title [DSE] Defer alloca store elimination for CoroSplit [DSE] Mark promise of pre-split coroutine visible to caller Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

17 Regression: over optimization at -O2
4 participants