Skip to content

Bug: Database connection gives up (try restarting transaction) and corrupts workspace with ContentStreamWasClosed #5713

@mhsdesign

Description

@mhsdesign

On slack https://neos-project.slack.com/archives/C050C8FEK/p1763480787936899 it was reported that in rare cases a users workspace is closed.

hotfix(es)

  • for experienced: add a new ContentStreamWasReopened to the workspace and trigger a catchup
  • for experienced: remove the ContentStreamWasClosed from the workspace and run ./flow subscription:replayAll (do a backup)
  • or if the content doesnt matter delete the users workspace ./flow workspace:delete my-workspace --force

So far we have not been able to reproduce the change but our ci just happened to show that its "real" - now the job was quite reliable for now a year and probably has failed since then only a handful of times. One time i captured here:

The WorkspacePublicationDuringWriting test should prove that while nodes are excessively written to the live workspace that publishing a workspace to live from another user should be handled. Under normal circumstances each process would interchanging claim a lock and the other would wait gracefully. One accepted exception is that the the users trying to publish might encounter a ConcurrencyException that would just mean that the users has to try again and wait. (see below we got ConcurrencyException: Expected version: 2, actual version: 35)

Now the thing we NEVER want is that the first process wich continuously writes to live suddenly dies because the database gives up

SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction

Which might already corrupt the system and worse secondly the second process trying to publish starts the publishing procedure to first close its content stream but then encounters the ConcurrencyException and somehow cannot recover (it should reopen in the normal case)

There was 1 error:

1) Neos\ContentRepository\BehavioralTests\Tests\Parallel\WorkspacePublicationDuringWriting\WorkspacePublicationDuringWritingTest::whileANodesArWrittenOnLive
Neos\ContentRepository\Core\Subscription\Exception\CatchUpHadErrors:

Error while catching up: Event 6264 in "contentGraph": An exception occurred while executing a query: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction

/home/runner/work/neos-development-collection/neos-development-collection/neos-development-distribution/Packages/Neos/Neos.ContentRepository.Core/Classes/Subscription/Exception/CatchUpHadErrors.php:33
/home/runner/work/neos-development-collection/neos-development-collection/neos-development-distribution/Packages/Neos/Neos.ContentRepository.Core/Classes/ContentRepository.php:116
/home/runner/work/neos-development-collection/neos-development-collection/neos-development-distribution/Packages/Neos/Neos.ContentRepository.BehavioralTests/Tests/Parallel/WorkspacePublicationDuringWriting/WorkspacePublicationDuringWritingTest.php:154
There was 1 failure:

1) Neos\ContentRepository\BehavioralTests\Tests\Parallel\WorkspacePublicationDuringWriting\WorkspacePublicationDuringWritingTest::thenConcurrentPublishLeadsToException

Workspace that failed to be publish cannot be written: Content stream "user-cs-id" is closed.

/home/runner/work/neos-development-collection/neos-development-collection/neos-development-distribution/Packages/Neos/Neos.ContentRepository.BehavioralTests/Tests/Parallel/WorkspacePublicationDuringWriting/WorkspacePublicationDuringWritingTest.php:252

The parallel logs are as following

Time: 03:11.859, Memory: 18.00 MB

WorkspacePublicationDuringWritingTest: [pid 5184, time 1765811054] ------ process started ------
WorkspacePublicationDuringWritingTest: [pid 5182, time 1765811054] ------ process started ------
WorkspacePublicationDuringWritingTest: [pid 5184, time 1765811054] setup started
WorkspacePublicationDuringWritingTest: [pid 5182, time 1765811054] waiting for setup
WorkspacePublicationDuringWritingTest: [pid 5184, time 1765811190] setup finished
WorkspacePublicationDuringWritingTest: [pid 5184, time 1765811190] waiting to publish
WorkspacePublicationDuringWritingTest: [pid 5182, time 1765811190] wait for setup finished
WorkspacePublicationDuringWritingTest: [pid 5182, time 1765811190] writing started
WorkspacePublicationDuringWritingTest: [pid 5184, time 1765811190] publish started
WorkspacePublicationDuringWritingTest: [pid 5184, time 1765811245] Got exception ConcurrencyException: Expected version: 2, actual version: 35
WorkspacePublicationDuringWritingTest: [pid 5184, time 1765811245] publish finished

Related #5513

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions