You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The existing `non_empty_since` timestamp, which is already used to check for `blackout_period`, can be repurposed to track the start of the warmup period for each endpoint. This timestamp tracks when the first non-zero load report was received, marking the beginning of its warmup period. This approach eliminates the need for additional timestamp tracking while maintaining the existing functionality.
101
+
To maintain independence between the blackout period and slow start period, a new timestamp `ready_since` will be introduced to track when an endpoint transitioned to ready state. This timestamp is separate from the existing `non_empty_since` timestamp used for the blackout period.
102
+
103
+
The `ready_since` timestamp is set when an endpoint transitions from a non-ready state (e.g., CONNECTING, TRANSIENT_FAILURE) to a ready state (READY). This marks the beginning of the slow start period for that endpoint. The existing `non_empty_since` timestamp continues to be used exclusively for tracking the blackout period, which begins when the first non-zero load report is received.
102
104
103
105
Weight calculation in the WRR policy follows a two-step process. First, the base weight for each endpoint is computed using the formula from [gRFC A58][A58]:
The scaling formula ensures that new endpoints receive a gradually increasing share of traffic while maintaining a minimum threshold to prevent starvation. The `aggression` parameter allows fine-tuning of the ramp-up curve, enabling either more aggressive initial scaling (values > 1.0) or more conservative approaches (values < 1.0).
116
118
117
119
When an endpoint is not in the warmup period, the scale factor is set to 1.0, meaning the original weight is used without modification. This ensures that the slow start mechanism only affects endpoints during their initial warmup phase, after which they participate in normal load balancing based on their actual performance metrics.
118
120
119
121
### Blackout Period vs Slow Start
120
122
121
-
The WRR load balancing policy offers two independent mechanisms for handling new endpoints: the blackout period and slow start. These mechanisms can be used independently or in combination, allowing operators to choose the approach that best fits their needs.
123
+
The WRR load balancing policy will offers two independent mechanisms for handling new endpoints: the blackout period and slow start. These mechanisms can be used independently or in combination, allowing operators to choose the approach that best fits their needs.
124
+
125
+
The blackout period, which defaults to 10 seconds, begins when an endpoint receives its first non-zero load report (tracked by `non_empty_since` timestamp). During this period, the endpoint continues to receive traffic, but instead of using the weights reported by the backend servers, the load balancer uses the mean of all backend-reported weights. This period helps prevent churn in the load balancing decisions when the set of endpoint addresses changes, ensuring that the weights used are based on stable, continuous load reporting.
122
126
123
-
The blackout period, which defaults to 10 seconds, begins when an endpoint receives its first non-zero load report. During this period, the endpoint continues to receive traffic, but instead of using the weights reported by the backend servers, the load balancer uses the mean of all backend-reported weights. This period helps prevent churn in the load balancing decisions when the set of endpoint addresses changes, ensuring that the weights used are based on stable, continuous load reporting.
127
+
The slow start periodbegins when an endpoint transitions to ready state (tracked by `ready_since` timestamp) and applies a gradual scaling factor to the weights over a configurable duration. This scaling is applied to whatever weight is being used (either the mean weight during blackout period or the actual backend-reported weight after blackout period). The slow start period operates independently of the blackout period, meaning it will continue to scale the weights regardless of whether the blackout period is still active or has ended.
124
128
125
-
The slow start period also begins when the endpoint receives its first non-zero load report and applies a gradual scaling factor to the weights over a configurable duration (default 30 seconds). This scaling is applied to whatever weight is being used (either the mean weight during blackout period or the actual backend-reported weight after blackout period). The slow start period operates independently of the blackout period, meaning it will continue to scale the weights regardless of whether the blackout period is still active or has ended.
129
+
The independence of these mechanisms allows for more flexible configurations:
130
+
- Slow start can begin immediately when an endpoint becomes ready, even before any load reports are received
131
+
- The blackout period can continue to function as designed for weight stability, regardless of the slow start configuration
132
+
- Both mechanisms can be tuned independently based on specific operational requirements
126
133
127
134
It is recommended to keep the blackout period shorter than the slow start period. This is because when the blackout period ends, the endpoint's weight will suddenly change from the mean weight to its actual backend-reported weight. If this weight is significantly higher than the mean (e.g., 2x the mean weight), it could cause a sudden traffic spike that defeats the purpose of gradual traffic increase. By having a longer slow start period, the scaling factor will continue to gradually increase the weight even after the blackout period ends, ensuring a smooth transition to the full backend-reported weight.
128
135
129
-
When endpoint weights become stale after the `weight_expiration_period`, the load balancer will continue to use the mean weight for load balancing. This is different from the blackout period as it's a response to weight staleness rather than initial endpoint setup. In these cases, since the endpoint weights were previously active, the slow start period is typically not triggered. However, when new weights arrive after expiration, the endpoint will enter the blackout period again, and if slow start is configured, the weights will be scaled up gradually during this period.
136
+
When endpoint weights become stale after the `weight_expiration_period`, the load balancer will continue to use the mean weight for load balancing. This is different from the blackout period as it's a response to weight staleness rather than initial endpoint setup. In these cases, since the endpoint weights were previously active, the slow start period is typically not triggered. When new weights arrive after expiration, the endpoint will enter the blackout period again but not the slow start since there was no transition of sub-channel state and no gradullay traffic increase should ideally be expected here.
130
137
131
138
These mechanisms can be configured in different ways:
132
139
- Using only blackout period: Ensures stable weight reporting by using mean weights before switching to backend weights
@@ -146,14 +153,15 @@ min_weight_percent: float
146
153
aggression: float
147
154
148
155
// State
149
-
non_empty_since: Time // Time when first non-zero load report was received
156
+
non_empty_since: Time // Time when first non-zero load report was received (for blackout period)
157
+
ready_since: Time // Time when endpoint transitioned to ready state (for slow start period)
0 commit comments