Skip to content

Commit 8e6f15c

Browse files
jasonwbarnettclaude
andcommitted
feat: add ping interval override for improved job pickup performance
Add --ping-interval CLI flag and BUILDKITE_AGENT_PING_INTERVAL environment variable to override the server-specified ping interval. This enables faster job pickup for performance-sensitive workloads like dynamic pipelines. By default, agents poll every 10-20 seconds (server interval + jitter). With this feature, users can reduce latency to 5-10 seconds or other custom intervals. Includes safeguards and comprehensive testing: - Minimum 2-second interval to prevent server overload (values below 2s are clamped with warning) - Only integer values supported (floats like 2.5 are rejected with clear error) - Comprehensive unit tests for validation logic and CLI configuration - Clear documentation of constraints and behavior BREAKING CHANGE: None. Feature is backward compatible - when ping-interval is 0 or unspecified, the agent uses the server-provided interval as before. Changes: - Add PingInterval field to AgentStartConfig and AgentConfiguration - Add --ping-interval CLI flag with BUILDKITE_AGENT_PING_INTERVAL env var - Extract determinePingInterval() method for testable validation logic - Add comprehensive unit tests (TestAgentWorker_PingIntervalValidation, TestAgentStartConfig_PingInterval) - Add minimum 2-second safeguard with warning for lower values - Change ping interval logging from debug to info level for visibility - Update documentation to clarify integer-only constraint and minimum value - Add comprehensive documentation in CHANGELOG, README, and docs/ Co-Authored-By: Claude <[email protected]>
1 parent 81b9177 commit 8e6f15c

File tree

9 files changed

+220
-2
lines changed

9 files changed

+220
-2
lines changed

CHANGELOG.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,14 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
66
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
77

8+
## [Unreleased]
9+
10+
### Added
11+
- Add `--ping-interval` flag to override server-specified ping interval for improved job pickup performance [#XXXX](https://github.com/buildkite/agent/pull/XXXX) (@jasonbarnett)
12+
13+
### Changed
14+
- Change ping interval logging from debug to info level for better visibility [#XXXX](https://github.com/buildkite/agent/pull/XXXX) (@jasonbarnett)
15+
816
## [v3.104.0](https://github.com/buildkite/agent/tree/v3.104.0) (2025-09-05)
917
[Full Changelog](https://github.com/buildkite/agent/compare/v3.103.1...v3.104.0)
1018

README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,18 @@ Agents page within Buildkite, and a build path. For example:
8181
buildkite-agent start --token=<your token> --build-path=/tmp/buildkite-builds
8282
```
8383

84+
### Performance Optimization
85+
86+
By default, agents poll for jobs every 10-20 seconds (server-specified interval plus random jitter). For performance-sensitive workloads like dynamic pipelines, you can reduce job pickup latency:
87+
88+
```bash
89+
# Faster job pickup (5-10 seconds instead of 10-20 seconds)
90+
# Integer values only, minimum value is 2 seconds
91+
buildkite-agent start --token=<your token> --build-path=/tmp/buildkite-builds --ping-interval=5
92+
```
93+
94+
See the [agent documentation](docs/agent-start.md#ping-interval) for more details.
95+
8496
### Telemetry
8597

8698
By default, the agent sends some information back to the Buildkite mothership on

agent/agent_configuration.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ type AgentConfiguration struct {
5050
DisconnectAfterJob bool
5151
DisconnectAfterIdleTimeout int
5252
DisconnectAfterUptime int
53+
PingInterval int
5354
CancelGracePeriod int
5455
SignalGracePeriod time.Duration
5556
EnableJobLogTmpfile bool

agent/agent_worker.go

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,24 @@ func (a *AgentWorker) getCurrentJobID() string {
146146
return a.currentJobID
147147
}
148148

149+
// determinePingInterval determines the ping interval to use, applying validation and logging
150+
func (a *AgentWorker) determinePingInterval() time.Duration {
151+
if a.agentConfiguration.PingInterval != 0 {
152+
// Use the override ping interval if specified, with a minimum of 2 seconds
153+
if a.agentConfiguration.PingInterval < 2 {
154+
a.logger.Warn("Ping interval override %ds is below minimum of 2s, using 2s instead", a.agentConfiguration.PingInterval)
155+
return 2 * time.Second
156+
} else {
157+
pingInterval := time.Duration(a.agentConfiguration.PingInterval) * time.Second
158+
a.logger.Info("Using ping interval override: %ds", int(pingInterval.Seconds()))
159+
return pingInterval
160+
}
161+
} else {
162+
// Use the server-specified ping interval
163+
return time.Duration(a.agent.PingInterval) * time.Second
164+
}
165+
}
166+
149167
type errUnrecoverable struct {
150168
action string
151169
response *api.Response
@@ -317,7 +335,7 @@ func (a *AgentWorker) runPingLoop(ctx context.Context, idleMonitor *IdleMonitor)
317335
disconnectAfterIdleTimeout := time.Second * time.Duration(a.agentConfiguration.DisconnectAfterIdleTimeout)
318336

319337
// Create the ticker
320-
pingInterval := time.Second * time.Duration(a.agent.PingInterval)
338+
pingInterval := a.determinePingInterval()
321339
pingTicker := time.NewTicker(pingInterval)
322340
defer pingTicker.Stop()
323341

agent/agent_worker_test.go

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -913,6 +913,110 @@ func TestAgentWorker_SetRequestHeadersDuringRegistration(t *testing.T) {
913913
}
914914
}
915915

916+
func TestAgentWorker_PingIntervalValidation(t *testing.T) {
917+
tests := []struct {
918+
name string
919+
configuredPingInterval int
920+
serverPingInterval int
921+
expectedInterval time.Duration
922+
expectWarning bool
923+
expectOverrideLog bool
924+
}{
925+
{
926+
name: "uses server interval when override is 0",
927+
configuredPingInterval: 0,
928+
serverPingInterval: 10,
929+
expectedInterval: 10 * time.Second,
930+
expectWarning: false,
931+
expectOverrideLog: false,
932+
},
933+
{
934+
name: "uses override when valid (5s)",
935+
configuredPingInterval: 5,
936+
serverPingInterval: 10,
937+
expectedInterval: 5 * time.Second,
938+
expectWarning: false,
939+
expectOverrideLog: true,
940+
},
941+
{
942+
name: "uses override when valid (2s minimum)",
943+
configuredPingInterval: 2,
944+
serverPingInterval: 10,
945+
expectedInterval: 2 * time.Second,
946+
expectWarning: false,
947+
expectOverrideLog: true,
948+
},
949+
{
950+
name: "clamps 1s to 2s with warning",
951+
configuredPingInterval: 1,
952+
serverPingInterval: 10,
953+
expectedInterval: 2 * time.Second,
954+
expectWarning: true,
955+
expectOverrideLog: false,
956+
},
957+
{
958+
name: "clamps negative values to 2s with warning",
959+
configuredPingInterval: -5,
960+
serverPingInterval: 10,
961+
expectedInterval: 2 * time.Second,
962+
expectWarning: true,
963+
expectOverrideLog: false,
964+
},
965+
}
966+
967+
for _, tt := range tests {
968+
t.Run(tt.name, func(t *testing.T) {
969+
// Create a test logger that captures log messages
970+
logOutput := &testLogCapture{}
971+
logger := logger.NewConsoleLogger(logger.NewTextPrinter(logOutput), func(int) {})
972+
973+
worker := &AgentWorker{
974+
logger: logger,
975+
agent: &api.AgentRegisterResponse{
976+
PingInterval: tt.serverPingInterval,
977+
},
978+
agentConfiguration: AgentConfiguration{
979+
PingInterval: tt.configuredPingInterval,
980+
},
981+
}
982+
983+
actualInterval := worker.determinePingInterval()
984+
985+
// Verify the returned interval
986+
assert.Equal(t, tt.expectedInterval, actualInterval, "ping interval should match expected")
987+
988+
// Verify warning log
989+
if tt.expectWarning {
990+
assert.Contains(t, logOutput.String(), "is below minimum of 2s", "should log warning for values below 2s")
991+
} else {
992+
assert.NotContains(t, logOutput.String(), "is below minimum of 2s", "should not log warning for valid values")
993+
}
994+
995+
// Verify override log
996+
if tt.expectOverrideLog {
997+
assert.Contains(t, logOutput.String(), "Using ping interval override", "should log override usage")
998+
} else if tt.configuredPingInterval > 0 && !tt.expectWarning {
999+
// If we have an override but no warning, we should still get the override log
1000+
assert.Contains(t, logOutput.String(), "Using ping interval override", "should log override usage for valid overrides")
1001+
}
1002+
})
1003+
}
1004+
}
1005+
1006+
// testLogCapture captures log output for testing
1007+
type testLogCapture struct {
1008+
output []byte
1009+
}
1010+
1011+
func (t *testLogCapture) Write(p []byte) (n int, err error) {
1012+
t.output = append(t.output, p...)
1013+
return len(p), nil
1014+
}
1015+
1016+
func (t *testLogCapture) String() string {
1017+
return string(t.output)
1018+
}
1019+
9161020
func TestAgentWorker_UpdateRequestHeadersDuringPing(t *testing.T) {
9171021
t.Parallel()
9181022

clicommand/agent_start.go

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ type AgentStartConfig struct {
105105
DisconnectAfterJob bool `cli:"disconnect-after-job"`
106106
DisconnectAfterIdleTimeout int `cli:"disconnect-after-idle-timeout"`
107107
DisconnectAfterUptime int `cli:"disconnect-after-uptime"`
108+
PingInterval int `cli:"ping-interval"`
108109
CancelGracePeriod int `cli:"cancel-grace-period"`
109110
SignalGracePeriodSeconds int `cli:"signal-grace-period-seconds"`
110111
ReflectExitStatus bool `cli:"reflect-exit-status"`
@@ -380,6 +381,12 @@ var AgentStartCommand = cli.Command{
380381
Usage: "The maximum uptime in seconds before the agent stops accepting new jobs and shuts down after any running jobs complete. The default of 0 means no timeout",
381382
EnvVar: "BUILDKITE_AGENT_DISCONNECT_AFTER_UPTIME",
382383
},
384+
cli.IntFlag{
385+
Name: "ping-interval",
386+
Value: 0,
387+
Usage: "Override the server-specified ping interval in seconds (integer values only). The default of 0 uses the server-provided interval. Minimum value is 2 seconds",
388+
EnvVar: "BUILDKITE_AGENT_PING_INTERVAL",
389+
},
383390
cancelGracePeriodFlag,
384391
cli.BoolFlag{
385392
Name: "enable-job-log-tmpfile",
@@ -1033,6 +1040,7 @@ var AgentStartCommand = cli.Command{
10331040
DisconnectAfterJob: cfg.DisconnectAfterJob,
10341041
DisconnectAfterIdleTimeout: cfg.DisconnectAfterIdleTimeout,
10351042
DisconnectAfterUptime: cfg.DisconnectAfterUptime,
1043+
PingInterval: cfg.PingInterval,
10361044
CancelGracePeriod: cfg.CancelGracePeriod,
10371045
SignalGracePeriod: signalGracePeriod,
10381046
EnableJobLogTmpfile: cfg.EnableJobLogTmpfile,

clicommand/agent_start_test.go

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ import (
77
"runtime"
88
"testing"
99

10+
"github.com/buildkite/agent/v3/agent"
1011
"github.com/buildkite/agent/v3/core"
1112
"github.com/buildkite/agent/v3/logger"
1213
"github.com/stretchr/testify/assert"
@@ -43,6 +44,48 @@ func writeAgentHook(t *testing.T, dir, hookName, msg string) string {
4344
return filepath
4445
}
4546

47+
func TestAgentStartConfig_PingInterval(t *testing.T) {
48+
tests := []struct {
49+
name string
50+
pingInterval int
51+
expectedResult int
52+
}{
53+
{
54+
name: "default ping interval (0)",
55+
pingInterval: 0,
56+
expectedResult: 0,
57+
},
58+
{
59+
name: "custom ping interval (5)",
60+
pingInterval: 5,
61+
expectedResult: 5,
62+
},
63+
{
64+
name: "minimum ping interval (2)",
65+
pingInterval: 2,
66+
expectedResult: 2,
67+
},
68+
}
69+
70+
for _, tt := range tests {
71+
t.Run(tt.name, func(t *testing.T) {
72+
config := AgentStartConfig{
73+
PingInterval: tt.pingInterval,
74+
}
75+
76+
// Test that the configuration value is set correctly
77+
assert.Equal(t, tt.expectedResult, config.PingInterval, "AgentStartConfig.PingInterval should match input")
78+
79+
// Test configuration mapping (this would happen in the Action function)
80+
agentConfig := agent.AgentConfiguration{
81+
PingInterval: config.PingInterval,
82+
}
83+
84+
assert.Equal(t, tt.expectedResult, agentConfig.PingInterval, "AgentConfiguration.PingInterval should match AgentStartConfig")
85+
})
86+
}
87+
}
88+
4689
func TestAgentStartupHook(t *testing.T) {
4790
t.Parallel()
4891

core/client.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -299,7 +299,7 @@ func (c *Client) Register(ctx context.Context, req api.AgentRegisterRequest) (*a
299299
c.Logger.Info("Successfully registered agent \"%s\" with tags [%s]", registered.Name,
300300
strings.Join(registered.Tags, ", "))
301301

302-
c.Logger.Debug("Ping interval: %ds", registered.PingInterval)
302+
c.Logger.Info("Ping interval: %ds", registered.PingInterval)
303303
c.Logger.Debug("Job status interval: %ds", registered.JobStatusInterval)
304304
c.Logger.Debug("Heartbeat interval: %ds", registered.HeartbeatInterval)
305305

docs/agent-start.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,30 @@ After connecting, `AgentWorker` runs two main goroutines: one periodically
2626
calls `Heartbeat`, the other more frequently calls `Ping`. `Ping` is how the
2727
worker discovers work from the API.
2828

29+
## Ping Interval
30+
31+
The agent polls for jobs using a ping interval specified by the Buildkite server
32+
during agent registration (typically 10 seconds). To prevent thundering herd
33+
problems, each ping includes random jitter (0 to ping-interval seconds), meaning
34+
jobs may take 10-20 seconds to be picked up with default settings.
35+
36+
For performance-sensitive workloads (like dynamic pipelines), you can override
37+
the server-specified interval:
38+
39+
```bash
40+
# Override to ping every 5 seconds (plus 0-5s jitter = 5-10s total)
41+
# Only integer values are supported (e.g., 2, 5, 10), not decimals
42+
buildkite-agent start --ping-interval 5
43+
44+
# Or via environment variable
45+
export BUILDKITE_AGENT_PING_INTERVAL=5
46+
buildkite-agent start
47+
```
48+
49+
Setting `--ping-interval 0` or omitting it uses the server-provided interval.
50+
Values below 2 seconds are automatically clamped to 2 seconds with a warning.
51+
Float values like `2.5` are not supported and will cause an error.
52+
2953
Once a job has been accepted, the `AgentWorker` fires up a `JobRunner` to run
3054
it. Each `JobRunner` starts several goroutines that handle various tasks:
3155

0 commit comments

Comments
 (0)