Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sporadic "forwarding Ping: no such job" errors in CI #5171

Open
cburroughs opened this issue Jul 18, 2024 · 2 comments
Open

sporadic "forwarding Ping: no such job" errors in CI #5171

cburroughs opened this issue Jul 18, 2024 · 2 comments

Comments

@cburroughs
Copy link

We run our CI runners on k8s using docker-in-docker + buildx. Specifically these are GitLab CI runners. "Sometimes" docker builds fail with with:

stderr:

ERROR: NotFound: forwarding Ping: no such job 7zy6tgcxqlcljaw8fsd6xs4dm

I'm opening the issue against this repository since it is where the error string appears to reside:

return nil, errors.Wrap(err, "forwarding Ping")

Unfortunately I have not found a clear pattern or repro case, and every time I search the only other English result is earthly/earthly#3454

Version Information (from build pod):

$ uname -a
Linux runner-qn7qyr8ex-project-40783171-concurrent-14-ksipf5ce 6.1.94-99.176.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Jun 18 14:57:56 UTC 2024 x86_64 GNU/Linux
$ docker info
Client: Docker Engine - Community
 Version:    27.0.3
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.15.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.28.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose
Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 27.0.3
 Storage Driver: overlayfs
  driver-type: io.containerd.snapshotter.v1
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
 runc version: v1.1.13-0-g58aa920
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.1.94-99.176.amzn2023.x86_64
 Operating System: Alpine Linux v3.20 (containerized)
 OSType: linux
 Architecture: x86_64
 CPUs: 32
 Total Memory: 240.2GiB
 Name: runner-qn7qyr8ex-project-40783171-concurrent-14-ksipf5ce
 ID: 51762a1f-2261-4fdb-84b8-c90c1445cbc1
 Docker Root Dir: /builds/docker-data-root
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine
[DEPRECATION NOTICE]: API is accessible on http://0.0.0.0:2375/ without encryption.
         Access to the remote API is equivalent to root access on the host. Refer
         to the 'Docker daemon attack surface' section in the documentation for
         more information: https://docs.docker.com/go/attack-surface/
In future versions this will be a hard failure preventing the daemon from starting! Learn more at: https://docs.docker.com/go/api-security/
$ neofetch
       _,met$$$$$gg.
    ,g$$$$$$$$$$$$$$$P.
  ,g$$P"     """Y$$.".
 ,$$P'              `$$$.
',$$P       ,ggs.     `$$b:
`d$$'     ,$P"'   .    $$$
 $$P      d$'     ,    $$P
 $$:      $$.   -    ,d$$'
 $$;      Y$b._   _,d$P'
 Y$$.    `.`"Y$$$$P"'
 `$$b      "-.__
  `Y$$
   `Y$$.
     `$$b.
       `Y$$b.
          `"Y$b._
              `"""
root@runner-qn7qyr8ex-project-40783171-concurrent-14-ksipf5ce 
------------------------------------------------------------- 
OS: Debian GNU/Linux 12 (bookworm) x86_64 
Host: HVM domU 4.11.amazon 
Kernel: 6.1.94-99.176.amzn2023.x86_64 
Uptime: 3 mins 
Packages: 456 (dpkg) 
Shell: bash 5.2.15 
CPU: Intel Xeon E5-2670 v2 (32) @ 2.493GHz 
Memory: 4793MiB / 245942MiB
@mispencer
Copy link

We have seen this error many times in our CI docker builds as well. It appears to occur when the hosting VM is low on system resources.

@fourdim
Copy link

fourdim commented Sep 15, 2024

I also encountered this error when I docker compose up more than 600 containers on a single machine.
Can we extend the timeout here?

ctx, _ = context.WithTimeoutCause(ctx, 3*time.Second, errors.WithStack(context.DeadlineExceeded))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants