-
Notifications
You must be signed in to change notification settings - Fork 956
feat: adding telemetry-trace-sample-rate cli arg and set default to 1%
#8561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
eserilev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few minor things and it should be ready for another round of review. Thanks!
lighthouse/src/main.rs
Outdated
| let sample_rate = matches | ||
| .get_one::<f64>("telemetry-trace-sample-rate") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should keep this a usize and allow values between 0 - 100. we try to avoid using f64 in our codebase in general and i dont think we need more granularity here than full percentage points
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ok i'm down to convert it to u8 but this needs to be convert to ratio in f64 when we set the variant.
see below
TraceIdRatioBased([f64](https://doc.rust-lang.org/nightly/std/primitive.f64.html))
Sample a given fraction of traces. Fractions >= 1 will always sample. If the parent span is sampled, then it’s child spans will automatically be sampled. Fractions < 0 are treated as zero, but spans may still be sampled if their parent is. Note: If this is used then all Spans in a trace will become sampled assuming that the first span is sampled as it is based on the trace_id not the span_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gonna use u8 instead of usize given ValueParserFactory does not support usize for range, if its cool hehe
|
tagging as backwards incompatible because the default trace span sampling is reduced from 100% to 1% |
|
@eserilev should be good to go for another review. Just used the config in the BN to keep the tests simple and following similar pattern as the other cli flag tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really close! just one small thing and it should be good to go in my opinion
lighthouse/src/main.rs
Outdated
| // Calculate sample percent as a ratio (percentage / 100) | ||
| telemetry_sample_ratio = Some(matches | ||
| .get_one::<u8>("telemetry-trace-sample-rate") | ||
| .copied() | ||
| .unwrap_or(1) as f64 / 100.0); | ||
| let sampler = opentelemetry_sdk::trace::Sampler::ParentBased(Box::new( | ||
| opentelemetry_sdk::trace::Sampler::TraceIdRatioBased(telemetry_sample_ratio.unwrap()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: safe math is probably overkill here tbh, i think maybe just removing the Some wrapping and unwrapping is fine
I think we can remove the Some wrapping and unwrapping and also introduce safe math here
maybe something like:
| // Calculate sample percent as a ratio (percentage / 100) | |
| telemetry_sample_ratio = Some(matches | |
| .get_one::<u8>("telemetry-trace-sample-rate") | |
| .copied() | |
| .unwrap_or(1) as f64 / 100.0); | |
| let sampler = opentelemetry_sdk::trace::Sampler::ParentBased(Box::new( | |
| opentelemetry_sdk::trace::Sampler::TraceIdRatioBased(telemetry_sample_ratio.unwrap()), | |
| // Calculate sample percent as a ratio (percentage / 100) | |
| telemetry_sample_ratio = matches | |
| .get_one::<u8>("telemetry-trace-sample-rate") | |
| .copied() | |
| .unwrap_or(1) as f64.safe_div(100.0).unwrap_or(0.01); | |
| let sampler = opentelemetry_sdk::trace::Sampler::ParentBased(Box::new( | |
| opentelemetry_sdk::trace::Sampler::TraceIdRatioBased(telemetry_sample_ratio), |
you'll also need to add the safe math crate to lighthouse/Cargo.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think safe_arith only implements traits for integers and not f64, probably because / 0 is handled by f64:inf so ya ill leave / 100 - i would also assume the sdk handles the f64:inf case but we are fine since we are hard coding the 100.0.
but ya I can remove the some on the ratio for sure
|
i think you might also need to run |
3cf9c1a to
62c8f0d
Compare
|
@eserilev changed from Option to f64, removed unused module, formatted, and ran |
Adding a new
telemetry-trace-sample-ratecli arg for both VC and BN to adjust sampling rate for OpenTelemetry tracing spans. Default will now be 1%.#8554
Proposed Changes
Simply add the cli arg then set the sampler arg in the open telemetry object constructor via below variant
https://docs.rs/opentelemetry_sdk/0.30.0/opentelemetry_sdk/trace/enum.Sampler.html#variant.TraceIdRatioBased
Additional Info
Didn't add unit tests for this given low risk and low ROI to added meaningfully test in this case
Testing
ran cargo nextest run -p lighthouse --release and all tests passed, see below