You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
There's an open issue for the loadbalancing exporter where spans are lost when collectors restart or scale. The idea of using the failover connector to handle this was mentioned by the author of the loadbalancing component: #36717 (comment)
We also run into this issue when load balancing spans before they're held in memory for tail sampling.
When load balancing fails it'd be fine to fallback to exporting the spans directly to our observability vendor without sampling.
It looks like failover connector could solve the above problem but I wonder if the complexity of the retry logic is a deterrent for this component's adoption. This component's alpha status and non-trivial implementation makes us wary about using it without thorough testing, just in case it causes bugs with the 99.9%+ happy path. Testing also seems tricky because there could be subtle race conditions like in #36587 that rarely get exercised
Describe the solution you'd like
An option to enable a simpler failover mode where the connector synchronously tries all exporters in priority order until one succeeds or there are none left to try.
This would solve the "load balance for sampling or export without sampling" use-case and also "export or verbosely log the failed telemetry". Not sure if there's other use-cases but it might encourage adoption if there's an easier way to get started without tuning the retry logic.
A downside might be encouraging less resilient collector set-ups where an exporter facing an outage is continually retried and it'd be better to use the existing fail-fast retry logic.
Describe alternatives you've considered
Experimenting with the load balancing exporter and this component to find a configuration that works decently. Just documenting this config could be enough.
Additional context
No response
The text was updated successfully, but these errors were encountered:
Hi @swar8080, thanks for this issue. It’s an interesting idea but my concern is that at high throughput it would add back pressure to the export pipelines if now each export had to go through multiple failed cycles in a failover scenario.
It’s an option but I think similar to the retry logic there would be certain usage patterns that wouldn’t fit this pattern to well.
I have been planning to switch up the retry logic to do something similar to what you described but for only one data point. As in one data point would be sampled for retry eval and would in parallel to the main export pipeline, synchronously go through every higher priority pipeline.
I have been planning to switch up the retry logic to do something similar to what you described but for only one data point. As in one data point would be sampled for retry eval and would in parallel to the main export pipeline, synchronously go through every higher priority pipeline.
Not sure i'm following how this works but will keep an eye out for changes to this component :)
Component(s)
connector/failover
Is your feature request related to a problem? Please describe.
There's an open issue for the
loadbalancing
exporter where spans are lost when collectors restart or scale. The idea of using the failover connector to handle this was mentioned by the author of theloadbalancing
component: #36717 (comment)We also run into this issue when load balancing spans before they're held in memory for tail sampling.
When load balancing fails it'd be fine to fallback to exporting the spans directly to our observability vendor without sampling.
It looks like failover connector could solve the above problem but I wonder if the complexity of the retry logic is a deterrent for this component's adoption. This component's alpha status and non-trivial implementation makes us wary about using it without thorough testing, just in case it causes bugs with the 99.9%+ happy path. Testing also seems tricky because there could be subtle race conditions like in #36587 that rarely get exercised
Describe the solution you'd like
An option to enable a simpler failover mode where the connector synchronously tries all exporters in priority order until one succeeds or there are none left to try.
This would solve the "load balance for sampling or export without sampling" use-case and also "export or verbosely log the failed telemetry". Not sure if there's other use-cases but it might encourage adoption if there's an easier way to get started without tuning the retry logic.
A downside might be encouraging less resilient collector set-ups where an exporter facing an outage is continually retried and it'd be better to use the existing fail-fast retry logic.
Describe alternatives you've considered
Experimenting with the load balancing exporter and this component to find a configuration that works decently. Just documenting this config could be enough.
Additional context
No response
The text was updated successfully, but these errors were encountered: