Skip to content

Cromwell shutdown with WorkflowStoreHeartbeatWriteActor Failed to properly process data #7792

@sulsj

Description

@sulsj

Hi,
We've identified an issue with our Cromwell service, as indicated by the message below. We would like to understand the situation and the reason, if possible. Could someone recommend a way to prevent this from happening?

Thank you.
Best,
Seung

2025-08-01 07:20:38 cromwell-system-akka.actor.default-dispatcher-2 INFO  - Message [akka.actor.Status$Failure] from Actor[akka://cromwell-system/user/cromwell-service/WorkflowStoreCoordinatedAccessActor#1955062230] to Actor[akka://cromwell-system/deadLetters] was not delivered. [1] dead letters encountered, no more dead letters will be logged. If this is not an expected behavior, then [Actor[akka://cromwell-system/deadLetters]] may have terminated unexpectedly, This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
2025-08-01 07:20:38 cromwell-system-akka.dispatchers.engine-dispatcher-38 ERROR - WorkflowStoreHeartbeatWriteActor Failed to properly process data
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://cromwell-system/user/cromwell-service/WorkflowStoreCoordinatedAccessActor#1955062230]] after [60000 ms]. Message of type [cromwell.engine.workflow.workflowstore.WorkflowStoreCoordinatedAccessActor$WriteHeartbeats]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
    at akka.pattern.PromiseActorRef$.$anonfun$defaultOnTimeout$1(AskSupport.scala:667)
    at akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:688)
    at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:202)
    at scala.concurrent.ExecutionContext$parasitic$.execute(ExecutionContext.scala:222)
    at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:334)
    at akka.actor.LightArrayRevolverScheduler$$anon$3.executeBucket$1(LightArrayRevolverScheduler.scala:285)
    at akka.actor.LightArrayRevolverScheduler$$anon$3.nextTick(LightArrayRevolverScheduler.scala:289)
    at akka.actor.LightArrayRevolverScheduler$$anon$3.run(LightArrayRevolverScheduler.scala:241)
    at java.base/java.lang.Thread.run(Thread.java:829)

...

2025-08-01 07:22:38 cromwell-system-akka.dispatchers.engine-dispatcher-47 ERROR - Shutting down cromid-165b018 as at least 34 heartbeat write errors have occurred between 2025-08-01T00:17:38.006760-07:00 and 2025-08-01T00:22:38.007474-07:00 (5.0000119 minutes)
2025-08-01 07:22:38 cromwell-system-akka.actor.default-dispatcher-31 INFO  - Workflow polling stopped
2025-08-01 07:22:38 cromwell-system-akka.dispatchers.engine-dispatcher-47 ERROR - WorkflowStoreHeartbeatWriteActor Failed to properly process data
java.util.concurrent.TimeoutException: Future timed out after [1 minute]
    at scala.concurrent.impl.Promise$DefaultPromise.tryAwait0(Promise.scala:248)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:261)
    at scala.concurrent.Await$.$anonfun$result$1(package.scala:201)
    at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread$$anon$3.block(ThreadPoolBuilder.scala:172)
    at akka.dispatch.forkjoin.ForkJoinPool.managedBlock(ForkJoinPool.java:3641)
    at akka.dispatch.MonitorableThreadFactory$AkkaForkJoinWorkerThread.blockOn(ThreadPoolBuilder.scala:170)
    at scala.concurrent.Await$.result(package.scala:124)
    at cromwell.engine.workflow.workflowstore.WorkflowStoreCoordinatedAccessActor.$anonfun$run$1(WorkflowStoreCoordinatedAccessActor.scala:25)
    at scala.util.Try$.apply(Try.scala:210)
    at cromwell.engine.workflow.workflowstore.WorkflowStoreCoordinatedAccessActor.run(WorkflowStoreCoordinatedAccessActor.scala:25)
    at cromwell.engine.workflow.workflowstore.WorkflowStoreCoordinatedAccessActor$$anonfun$receive$1.$anonfun$applyOrElse$1(WorkflowStoreCoordinatedAccessActor.scala:34)
    at cromwell.engine.workflow.workflowstore.WorkflowStoreCoordinatedAccessActor$$anonfun$receive$1.$anonfun$applyOrElse$1$adapted(WorkflowStoreCoordinatedAccessActor.scala:34)
    at mouse.AnyOps$.$bar$greater$extension(any.scala:31)
    at cromwell.engine.workflow.workflowstore.WorkflowStoreCoordinatedAccessActor$$anonfun$receive$1.applyOrElse(WorkflowStoreCoordinatedAccessActor.scala:34)
    at akka.actor.Actor.aroundReceive(Actor.scala:539)
    at akka.actor.Actor.aroundReceive$(Actor.scala:537)
    at cromwell.engine.workflow.workflowstore.WorkflowStoreCoordinatedAccessActor.aroundReceive(WorkflowStoreCoordinatedAccessActor.scala:20)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:614)
    at akka.actor.ActorCell.invoke(ActorCell.scala:583)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:268)
    at akka.dispatch.Mailbox.run(Mailbox.scala:229)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:241)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions