Skip to content

Increased rate of checkpoint restore failures on newer gVisor versions #11842

@cweld510

Description

@cweld510

Description

We're attempting to upgrade from runsc version 1a9abee80b7c to fb842aab7730 and we've observed an elevated rate of checkpoint restore failures (~0.1%) as a result. The errors are relatively inscrutable, so I'm posting here for any guidance the gVisor team might have in debugging further.

The error we see externally is:
starting container: restoring container "ta-01JYF9S88A165GXZKPKV9PYQ8M": urpc method "containerManager.Restore" failed: EOF.

I've attached debug logs from gVisor from a failed restore. They don't seem to offer much more insight -- there's no particular line of code which appears to be throwing.

runsc.log.20250624-151930.433342.restore.txt
runsc.log.20250624-151930.433342.boot.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions