Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Potential data/write conflicts #1336

Merged
merged 4 commits into from
Mar 7, 2025
Merged

Conversation

gunli
Copy link
Contributor

@gunli gunli commented Feb 24, 2025

Fixes #1335

Master Issue: #1335

Motivation

Sync all the write calls of a connection into one goroutine to avoid potential data/write conflicts

Modifications

Sync all the write calls of a connection into one goroutine.

Verifying this change

  • Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

No

@shibd @nodece @RobertIndie @BewareMyPower @crossoverJie
If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API: (yes / no)
  • The schema: (yes / no / don't know)
  • The default values of configurations: (yes / no)
  • The wire protocol: (yes / no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable / docs / GoDocs / not documented)
  • If a feature is not applicable for documentation, explain why?
  • If a feature is not documented yet in this PR, please create a followup issue for adding the documentation

@gunli
Copy link
Contributor Author

gunli commented Feb 24, 2025

This bug has been discussed in #1333.
@shibd @nodece @RobertIndie @BewareMyPower @crossoverJie

Copy link
Member

@nodece nodece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

shibd
shibd previously requested changes Feb 26, 2025
Copy link
Member

@shibd shibd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your contribution.

I don't understand what the issue is with different goroutines calling ctx.write. Didn't we add a lock before writing?

c.writeBufferLock.Lock()
defer c.writeBufferLock.Unlock()

If there is a issue, a single client may have multiple consumer and producer objects(many goroutines), which might use the same connection. How can we ensure they are all in the same goroutine?

BTW: Would it be possible to add unit tests to help reproduce the issue instead of guessing?

@gunli
Copy link
Contributor Author

gunli commented Feb 27, 2025

I don't understand what the issue is with different goroutines calling ctx.write

It is the same as writing a file in different threads, it will lead to data conflict.

@nodece
Copy link
Member

nodece commented Feb 27, 2025

@shibd Writing data and listening for the close event in separate goroutines can lead to inconsistencies in the response. This way may cause incorrect responses due to the race condition between the broker's successful response and the connection close.

For example:

  • Time1: The broker responds with success.
  • Time2: The connection is closed.
  • Result: Depending on timing, you may receive a 'success' response or an 'error: closed' response, which can lead to unexpected behavior."

@shibd shibd dismissed their stale review February 27, 2025 03:58

@nodece Thanks for explaining, I see.

I will revert my requested change, but I prefer to have a unit test to cover this.

@BewareMyPower
Copy link
Contributor

In addition to @nodece's explanation, currently the internalWriteData method could be called in two goroutines, which might have some underlying issues.

	go func() {
		for {
			select {
			/* ... */
			case req := <-c.incomingRequestsCh:
				/* ... */
				c.internalSendRequest(req) // [1]
			}
		}
	}()

	for {
		select {
		/* ... */
		case data := <-c.writeRequestsCh:
			/* ... */
			c.internalWriteData(data) // [2]

@gunli
Copy link
Contributor Author

gunli commented Feb 27, 2025

IT failed again, but it passed on my host, I think it is nothing to do with this PR.
image

@nodece
Copy link
Member

nodece commented Mar 6, 2025

@gunli Please check CI.

@nodece
Copy link
Member

nodece commented Mar 6, 2025

@gunli The failure of TestAvailablePermitsLeak occurs because when a consumer with ConsumerCryptoFailureActionDiscard receives a message, it sends an acknowledgment command to the broker. However, since both read and write operations are handled within the same goroutine, the write operation can block the read operation. This results in a deadlock-like situation where the read operation cannot proceed, ultimately leading to a context.DeadlineExceeded error.

To resolve this issue, read and write operations should be handled in separate goroutines to avoid blocking and ensure smooth message processing.

@gunli
Copy link
Contributor Author

gunli commented Mar 7, 2025

@gunli The failure of TestAvailablePermitsLeak occurs because when a consumer with ConsumerCryptoFailureActionDiscard receives a message, it sends an acknowledgment command to the broker. However, since both read and write operations are handled within the same goroutine, the write operation can block the read operation. This results in a deadlock-like situation where the read operation cannot proceed, ultimately leading to a context.DeadlineExceeded error.

To resolve this issue, read and write operations should be handled in separate goroutines to avoid blocking and ensure smooth message processing.

OK, I will create a new PR to fix this, it seems connection.incomingCmdCh is redundant, because reading is run in a seperate goroutine: go c.reader.readFromConnection().

@gunli
Copy link
Contributor Author

gunli commented Mar 7, 2025

Please merge #1343 first, I will rebase and update this PR after that.

@gunli gunli marked this pull request as draft March 7, 2025 03:09
@gunli
Copy link
Contributor Author

gunli commented Mar 7, 2025

mark it as draft until #1343 is merged, I will rebase and update this PR after that.

@gunli gunli marked this pull request as ready for review March 7, 2025 04:41
@crossoverJie crossoverJie merged commit 042bfcd into apache:master Mar 7, 2025
7 checks passed
@crossoverJie crossoverJie added this to the v0.15.0 milestone Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug][Connection] Potential data/write conflicts
5 participants