Skip to content

Redesigned ContainerExecDecorator #2832

@jglick

Description

@jglick

ExecWatch watch = nodeContext
.getPodResource()
.inContainer(containerName)
.redirectingInput(STDIN_BUFFER_SIZE) // JENKINS-50429
.writingOutput(stream)
.writingError(stream)
.usingListener(new ExecListener() {
@Override
public void onOpen() {
alive.set(true);
started.countDown();
startAlive.set(System.nanoTime());
LOGGER.log(Level.FINEST, "onOpen : {0}", finished);
}
@Override
public void onFailure(Throwable t, Response response) {
alive.set(false);
t.printStackTrace(launcher.getListener().getLogger());
started.countDown();
LOGGER.log(Level.FINEST, "onFailure : {0}", finished);
if (finished.getCount() == 0) {
LOGGER.log(
Level.WARNING,
"onFailure called but latch already finished. This may be a bug in the kubernetes-plugin");
}
finished.countDown();
}
@Override
public void onClose(int i, String s) {
alive.set(false);
started.countDown();
LOGGER.log(Level.FINEST, "onClose : {0} [{1} ms]", new Object[] {
finished,
TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startAlive.get())
});
if (finished.getCount() == 0) {
LOGGER.log(
Level.WARNING,
"onClose called but latch already finished. This indicates a bug in the kubernetes-plugin");
}
finished.countDown();
}
})
.exec(sh);
is using the equivalent of kubectl exec, going through the API server, which is fine for debugging or occasional scripting but not suited to running at scale.

The container should rather run a listener (say, on a Unix-domain socket; TBD what to do on Windows) and await commands from the agent container. These should be sent over Remoting.

The key difficulty is how to start this listener. Existing pipelines would normally run sleep infinity as the container entry point. Does the pipeline definition need to change? The controller could exec a nohup command to start the listener, but then we are back to using the API server while the pod is running; perhaps less than before but it is not clear this would be much of an improvement. The controller could quietly rewrite the entry point when creating the pod, but at that time it does not know which containers, if any, will be running the container step, so this could break sidecar containers such as databases. It would need to search for known “sleepy” patterns.

All of this poses a high risk of regression in unusual environments, and Windows support might lag Linux, so there would need to be system properties to opt in or out.

Some references:

Not related to scalability but: #1724

CloudBees-internal link

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions