Redesigned `ContainerExecDecorator`

https://github.com/jenkinsci/kubernetes-plugin/blob/94883c5046c51dbe4778b18ecb8466b41f8b3dc5/src/main/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/ContainerExecDecorator.java#L475-L520 is using the equivalent of `kubectl exec`, going through the API server, which is fine for debugging or occasional scripting but not suited to running at scale.

The container should rather run a listener (say, on a Unix-domain socket; TBD what to do on Windows) and await commands from the agent container. These should be sent over Remoting.

The key difficulty is how to start this listener. Existing pipelines would normally run `sleep infinity` as the container entry point. Does the pipeline definition need to change? The controller could `exec` a `nohup` command to start the listener, but then we are back to using the API server while the pod is running; perhaps less than before but it is not clear this would be much of an improvement. The controller could quietly rewrite the entry point when creating the pod, but at that time it does not know which containers, if any, will be running the `container` step, so this could break sidecar containers such as databases. It would need to search for known “sleepy” patterns. 

All of this poses a high risk of regression in unusual environments, and Windows support might lag Linux, so there would need to be system properties to opt in or out.

Some references:

* https://github.com/jenkinsci/kubernetes-plugin/pull/522
* https://github.com/jenkinsci/kubernetes-plugin/issues/2221
* https://github.com/jenkinsci/kubernetes-plugin/issues/2513
* https://github.com/jenkinsci/kubernetes-plugin/issues/2145
* https://github.com/jenkinsci/kubernetes-plugin/issues/2683#issuecomment-3604130939
* https://github.com/jenkinsci/kubernetes-plugin/issues/2540
* https://docs.cloudbees.com/docs/cloudbees-ci-kb/latest/client-and-managed-controllers/considerations-for-kubernetes-clients-connection-when-using-kubernetes-plugin
* https://docs.cloudbees.com/docs/cloudbees-ci-kb/latest/client-and-managed-controllers/why-is-the-container-step-slow

Not related to scalability but: https://github.com/jenkinsci/kubernetes-plugin/pull/1724

[CloudBees-internal link](https://cloudbees.atlassian.net/browse/BEE-71725)

	ExecWatch watch = nodeContext
	.getPodResource()
	.inContainer(containerName)
	.redirectingInput(STDIN_BUFFER_SIZE) // JENKINS-50429
	.writingOutput(stream)
	.writingError(stream)
	.usingListener(new ExecListener() {
	@Override
	public void onOpen() {
	alive.set(true);
	started.countDown();
	startAlive.set(System.nanoTime());
	LOGGER.log(Level.FINEST, "onOpen : {0}", finished);
	}

	@Override
	public void onFailure(Throwable t, Response response) {
	alive.set(false);
	t.printStackTrace(launcher.getListener().getLogger());
	started.countDown();
	LOGGER.log(Level.FINEST, "onFailure : {0}", finished);
	if (finished.getCount() == 0) {
	LOGGER.log(
	Level.WARNING,
	"onFailure called but latch already finished. This may be a bug in the kubernetes-plugin");
	}
	finished.countDown();
	}

	@Override
	public void onClose(int i, String s) {
	alive.set(false);
	started.countDown();
	LOGGER.log(Level.FINEST, "onClose : {0} [{1} ms]", new Object[] {
	finished,
	TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startAlive.get())
	});
	if (finished.getCount() == 0) {
	LOGGER.log(
	Level.WARNING,
	"onClose called but latch already finished. This indicates a bug in the kubernetes-plugin");
	}
	finished.countDown();
	}
	})
	.exec(sh);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redesigned `ContainerExecDecorator` #2832

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Redesigned ContainerExecDecorator #2832

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Redesigned `ContainerExecDecorator` #2832