-
Notifications
You must be signed in to change notification settings - Fork 469
feat(profiling): profile threading.Semaphore primitives with Python Lock profiler
#15327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 253 ± 2 ms. The average import time from base is: 262 ± 5 ms. The import time difference between this PR and base is: -9.5 ± 0.2 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate vlad/proflock-implement-semaphore (ae2c3c9) with baseline main (9ba3709) 📈 Performance Regressions (2 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 0.404µs (SLO: <10.000µs 📉 -96.0%) vs baseline: +0.1% Memory: ✅ 39.459MB (SLO: <41.500MB -4.9%) vs baseline: +4.3% ✅ add_inplace_aspectTime: ✅ 0.407µs (SLO: <10.000µs 📉 -95.9%) vs baseline: +0.2% Memory: ✅ 39.852MB (SLO: <41.500MB -4.0%) vs baseline: +5.9% ✅ add_inplace_noaspectTime: ✅ 0.320µs (SLO: <10.000µs 📉 -96.8%) vs baseline: +0.7% Memory: ✅ 39.440MB (SLO: <41.500MB -5.0%) vs baseline: +4.6% ✅ add_noaspectTime: ✅ 0.280µs (SLO: <10.000µs 📉 -97.2%) vs baseline: +1.5% Memory: ✅ 39.361MB (SLO: <41.500MB -5.2%) vs baseline: +4.7% ✅ bytearray_aspectTime: ✅ 1.331µs (SLO: <10.000µs 📉 -86.7%) vs baseline: +1.7% Memory: ✅ 39.813MB (SLO: <41.500MB -4.1%) vs baseline: +5.2% ✅ bytearray_extend_aspectTime: ✅ 1.544µs (SLO: <10.000µs 📉 -84.6%) vs baseline: +0.1% Memory: ✅ 39.459MB (SLO: <41.500MB -4.9%) vs baseline: +4.9% ✅ bytearray_extend_noaspectTime: ✅ 0.617µs (SLO: <10.000µs 📉 -93.8%) vs baseline: +0.7% Memory: ✅ 39.459MB (SLO: <41.500MB -4.9%) vs baseline: +4.9% ✅ bytearray_noaspectTime: ✅ 0.484µs (SLO: <10.000µs 📉 -95.2%) vs baseline: +0.1% Memory: ✅ 39.440MB (SLO: <41.500MB -5.0%) vs baseline: +4.8% ✅ bytes_aspectTime: ✅ 1.283µs (SLO: <10.000µs 📉 -87.2%) vs baseline: +0.1% Memory: ✅ 39.420MB (SLO: <41.500MB -5.0%) vs baseline: +4.6% ✅ bytes_noaspectTime: ✅ 0.495µs (SLO: <10.000µs 📉 -95.1%) vs baseline: -0.2% Memory: ✅ 39.872MB (SLO: <41.500MB -3.9%) vs baseline: +5.7% ✅ bytesio_aspectTime: ✅ 1.314µs (SLO: <10.000µs 📉 -86.9%) vs baseline: +1.4% Memory: ✅ 39.852MB (SLO: <41.500MB -4.0%) vs baseline: +5.0% ✅ bytesio_noaspectTime: ✅ 0.499µs (SLO: <10.000µs 📉 -95.0%) vs baseline: +0.6% Memory: ✅ 39.833MB (SLO: <41.500MB -4.0%) vs baseline: +5.8% ✅ capitalize_aspectTime: ✅ 0.733µs (SLO: <10.000µs 📉 -92.7%) vs baseline: -1.4% Memory: ✅ 39.577MB (SLO: <41.500MB -4.6%) vs baseline: +5.3% ✅ capitalize_noaspectTime: ✅ 0.437µs (SLO: <10.000µs 📉 -95.6%) vs baseline: +0.5% Memory: ✅ 39.361MB (SLO: <41.500MB -5.2%) vs baseline: +4.5% ✅ casefold_aspectTime: ✅ 0.737µs (SLO: <10.000µs 📉 -92.6%) vs baseline: +0.3% Memory: ✅ 39.911MB (SLO: <41.500MB -3.8%) vs baseline: +4.9% ✅ casefold_noaspectTime: ✅ 0.365µs (SLO: <10.000µs 📉 -96.3%) vs baseline: -0.7% Memory: ✅ 39.872MB (SLO: <41.500MB -3.9%) vs baseline: +6.0% ✅ decode_aspectTime: ✅ 0.718µs (SLO: <10.000µs 📉 -92.8%) vs baseline: -1.0% Memory: ✅ 39.499MB (SLO: <41.500MB -4.8%) vs baseline: +3.9% ✅ decode_noaspectTime: ✅ 0.419µs (SLO: <10.000µs 📉 -95.8%) vs baseline: ~same Memory: ✅ 39.754MB (SLO: <41.500MB -4.2%) vs baseline: +5.7% ✅ encode_aspectTime: ✅ 0.717µs (SLO: <10.000µs 📉 -92.8%) vs baseline: +1.2% Memory: ✅ 39.459MB (SLO: <41.500MB -4.9%) vs baseline: +3.9% ✅ encode_noaspectTime: ✅ 0.398µs (SLO: <10.000µs 📉 -96.0%) vs baseline: -1.6% Memory: ✅ 39.931MB (SLO: <41.500MB -3.8%) vs baseline: +6.2% ✅ format_aspectTime: ✅ 3.479µs (SLO: <10.000µs 📉 -65.2%) vs baseline: +4.8% Memory: ✅ 39.459MB (SLO: <41.500MB -4.9%) vs baseline: +4.9% ✅ format_map_aspectTime: ✅ 3.907µs (SLO: <10.000µs 📉 -60.9%) vs baseline: 📈 +11.4% Memory: ✅ 39.518MB (SLO: <41.500MB -4.8%) vs baseline: +5.2% ✅ format_map_noaspectTime: ✅ 0.777µs (SLO: <10.000µs 📉 -92.2%) vs baseline: +0.5% Memory: ✅ 39.322MB (SLO: <41.500MB -5.2%) vs baseline: +4.4% ✅ format_noaspectTime: ✅ 0.595µs (SLO: <10.000µs 📉 -94.1%) vs baseline: -0.2% Memory: ✅ 39.793MB (SLO: <41.500MB -4.1%) vs baseline: +5.7% ✅ index_aspectTime: ✅ 0.365µs (SLO: <10.000µs 📉 -96.4%) vs baseline: +1.3% Memory: ✅ 39.440MB (SLO: <41.500MB -5.0%) vs baseline: +4.4% ✅ index_noaspectTime: ✅ 0.276µs (SLO: <10.000µs 📉 -97.2%) vs baseline: -0.8% Memory: ✅ 39.852MB (SLO: <41.500MB -4.0%) vs baseline: +6.1% ✅ join_aspectTime: ✅ 1.387µs (SLO: <10.000µs 📉 -86.1%) vs baseline: +1.1% Memory: ✅ 39.479MB (SLO: <41.500MB -4.9%) vs baseline: +4.6% ✅ join_noaspectTime: ✅ 0.494µs (SLO: <10.000µs 📉 -95.1%) vs baseline: ~same Memory: ✅ 39.420MB (SLO: <41.500MB -5.0%) vs baseline: +4.8% ✅ ljust_aspectTime: ✅ 2.491µs (SLO: <20.000µs 📉 -87.5%) vs baseline: -0.3% Memory: ✅ 39.341MB (SLO: <41.500MB -5.2%) vs baseline: +4.6% ✅ ljust_noaspectTime: ✅ 0.407µs (SLO: <10.000µs 📉 -95.9%) vs baseline: +1.1% Memory: ✅ 39.872MB (SLO: <41.500MB -3.9%) vs baseline: +6.0% ✅ lower_aspectTime: ✅ 2.169µs (SLO: <10.000µs 📉 -78.3%) vs baseline: -1.4% Memory: ✅ 39.577MB (SLO: <41.500MB -4.6%) vs baseline: +4.9% ✅ lower_noaspectTime: ✅ 0.366µs (SLO: <10.000µs 📉 -96.3%) vs baseline: -0.4% Memory: ✅ 39.892MB (SLO: <41.500MB -3.9%) vs baseline: +6.2% ✅ lstrip_aspectTime: ✅ 2.280µs (SLO: <20.000µs 📉 -88.6%) vs baseline: +3.7% Memory: ✅ 39.459MB (SLO: <41.500MB -4.9%) vs baseline: +4.9% ✅ lstrip_noaspectTime: ✅ 0.383µs (SLO: <10.000µs 📉 -96.2%) vs baseline: -0.4% Memory: ✅ 39.892MB (SLO: <41.500MB -3.9%) vs baseline: +5.7% ✅ modulo_aspectTime: ✅ 1.048µs (SLO: <10.000µs 📉 -89.5%) vs baseline: +0.3% Memory: ✅ 39.459MB (SLO: <41.500MB -4.9%) vs baseline: +3.7% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 1.554µs (SLO: <10.000µs 📉 -84.5%) vs baseline: -0.5% Memory: ✅ 39.636MB (SLO: <41.500MB -4.5%) vs baseline: +5.2% ✅ modulo_aspect_for_bytesTime: ✅ 0.980µs (SLO: <10.000µs 📉 -90.2%) vs baseline: ~same Memory: ✅ 39.558MB (SLO: <41.500MB -4.7%) vs baseline: +4.8% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 1.242µs (SLO: <10.000µs 📉 -87.6%) vs baseline: -0.2% Memory: ✅ 39.459MB (SLO: <41.500MB -4.9%) vs baseline: +4.8% ✅ modulo_noaspectTime: ✅ 0.622µs (SLO: <10.000µs 📉 -93.8%) vs baseline: -1.7% Memory: ✅ 39.872MB (SLO: <41.500MB -3.9%) vs baseline: +6.0% ✅ replace_aspectTime: ✅ 4.868µs (SLO: <10.000µs 📉 -51.3%) vs baseline: -0.3% Memory: ✅ 39.400MB (SLO: <41.500MB -5.1%) vs baseline: +4.6% ✅ replace_noaspectTime: ✅ 0.463µs (SLO: <10.000µs 📉 -95.4%) vs baseline: +0.9% Memory: ✅ 39.872MB (SLO: <41.500MB -3.9%) vs baseline: +6.1% ✅ repr_aspectTime: ✅ 0.920µs (SLO: <10.000µs 📉 -90.8%) vs baseline: +1.7% Memory: ✅ 39.479MB (SLO: <41.500MB -4.9%) vs baseline: +4.6% ✅ repr_noaspectTime: ✅ 0.416µs (SLO: <10.000µs 📉 -95.8%) vs baseline: ~same Memory: ✅ 39.420MB (SLO: <41.500MB -5.0%) vs baseline: +4.9% ✅ rstrip_aspectTime: ✅ 1.869µs (SLO: <20.000µs 📉 -90.7%) vs baseline: -0.3% Memory: ✅ 39.518MB (SLO: <41.500MB -4.8%) vs baseline: +5.1% ✅ rstrip_noaspectTime: ✅ 0.382µs (SLO: <10.000µs 📉 -96.2%) vs baseline: +1.1% Memory: ✅ 39.813MB (SLO: <41.500MB -4.1%) vs baseline: +5.7% ✅ slice_aspectTime: ✅ 0.500µs (SLO: <10.000µs 📉 -95.0%) vs baseline: +0.6% Memory: ✅ 39.381MB (SLO: <41.500MB -5.1%) vs baseline: +4.4% ✅ slice_noaspectTime: ✅ 0.443µs (SLO: <10.000µs 📉 -95.6%) vs baseline: -0.7% Memory: ✅ 39.852MB (SLO: <41.500MB -4.0%) vs baseline: +6.1% ✅ stringio_aspectTime: ✅ 1.546µs (SLO: <10.000µs 📉 -84.5%) vs baseline: +1.5% Memory: ✅ 39.833MB (SLO: <41.500MB -4.0%) vs baseline: +4.7% ✅ stringio_noaspectTime: ✅ 0.718µs (SLO: <10.000µs 📉 -92.8%) vs baseline: -0.3% Memory: ✅ 39.793MB (SLO: <41.500MB -4.1%) vs baseline: +5.7% ✅ strip_aspectTime: ✅ 2.210µs (SLO: <20.000µs 📉 -88.9%) vs baseline: -0.3% Memory: ✅ 39.440MB (SLO: <41.500MB -5.0%) vs baseline: +4.9% ✅ strip_noaspectTime: ✅ 0.384µs (SLO: <10.000µs 📉 -96.2%) vs baseline: ~same Memory: ✅ 39.872MB (SLO: <41.500MB -3.9%) vs baseline: +6.0% ✅ swapcase_aspectTime: ✅ 2.403µs (SLO: <10.000µs 📉 -76.0%) vs baseline: +0.7% Memory: ✅ 39.499MB (SLO: <41.500MB -4.8%) vs baseline: +4.9% ✅ swapcase_noaspectTime: ✅ 0.539µs (SLO: <10.000µs 📉 -94.6%) vs baseline: ~same Memory: ✅ 39.892MB (SLO: <41.500MB -3.9%) vs baseline: +6.1% ✅ title_aspectTime: ✅ 2.363µs (SLO: <10.000µs 📉 -76.4%) vs baseline: +2.7% Memory: ✅ 39.558MB (SLO: <41.500MB -4.7%) vs baseline: +5.0% ✅ title_noaspectTime: ✅ 0.503µs (SLO: <10.000µs 📉 -95.0%) vs baseline: -1.0% Memory: ✅ 39.852MB (SLO: <41.500MB -4.0%) vs baseline: +5.8% ✅ translate_aspectTime: ✅ 3.187µs (SLO: <10.000µs 📉 -68.1%) vs baseline: -0.5% Memory: ✅ 39.499MB (SLO: <41.500MB -4.8%) vs baseline: +5.1% ✅ translate_noaspectTime: ✅ 1.046µs (SLO: <10.000µs 📉 -89.5%) vs baseline: +1.1% Memory: ✅ 39.518MB (SLO: <41.500MB -4.8%) vs baseline: +5.1% ✅ upper_aspectTime: ✅ 2.220µs (SLO: <10.000µs 📉 -77.8%) vs baseline: +0.9% Memory: ✅ 39.361MB (SLO: <41.500MB -5.2%) vs baseline: +4.7% ✅ upper_noaspectTime: ✅ 0.368µs (SLO: <10.000µs 📉 -96.3%) vs baseline: -0.9% Memory: ✅ 39.892MB (SLO: <41.500MB -3.9%) vs baseline: +6.2% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 5.202µs (SLO: <10.000µs 📉 -48.0%) vs baseline: 📈 +21.5% Memory: ✅ 39.931MB (SLO: <41.000MB -2.6%) vs baseline: +5.3% ✅ ospathbasename_noaspectTime: ✅ 1.081µs (SLO: <10.000µs 📉 -89.2%) vs baseline: +0.3% Memory: ✅ 39.911MB (SLO: <41.000MB -2.7%) vs baseline: +5.2% ✅ ospathjoin_aspectTime: ✅ 6.949µs (SLO: <10.000µs 📉 -30.5%) vs baseline: 📈 +12.3% Memory: ✅ 39.872MB (SLO: <41.000MB -2.8%) vs baseline: +4.8% ✅ ospathjoin_noaspectTime: ✅ 2.285µs (SLO: <10.000µs 📉 -77.1%) vs baseline: -0.2% Memory: ✅ 39.892MB (SLO: <41.000MB -2.7%) vs baseline: +5.0% ✅ ospathnormcase_aspectTime: ✅ 3.988µs (SLO: <10.000µs 📉 -60.1%) vs baseline: 📈 +12.8% Memory: ✅ 39.872MB (SLO: <41.000MB -2.8%) vs baseline: +5.1% ✅ ospathnormcase_noaspectTime: ✅ 0.565µs (SLO: <10.000µs 📉 -94.3%) vs baseline: -0.7% Memory: ✅ 39.872MB (SLO: <41.000MB -2.8%) vs baseline: +4.9% ✅ ospathsplit_aspectTime: ✅ 5.783µs (SLO: <10.000µs 📉 -42.2%) vs baseline: 📈 +17.9% Memory: ✅ 39.793MB (SLO: <41.000MB -2.9%) vs baseline: +4.4% ✅ ospathsplit_noaspectTime: ✅ 1.592µs (SLO: <10.000µs 📉 -84.1%) vs baseline: +0.6% Memory: ✅ 39.852MB (SLO: <41.000MB -2.8%) vs baseline: +4.8% ✅ ospathsplitdrive_aspectTime: ✅ 4.173µs (SLO: <10.000µs 📉 -58.3%) vs baseline: 📈 +11.5% Memory: ✅ 39.852MB (SLO: <41.000MB -2.8%) vs baseline: +4.8% ✅ ospathsplitdrive_noaspectTime: ✅ 0.698µs (SLO: <10.000µs 📉 -93.0%) vs baseline: +1.1% Memory: ✅ 39.931MB (SLO: <41.000MB -2.6%) vs baseline: +5.0% ✅ ospathsplitext_aspectTime: ✅ 4.686µs (SLO: <10.000µs 📉 -53.1%) vs baseline: +1.2% Memory: ✅ 39.813MB (SLO: <41.000MB -2.9%) vs baseline: +4.5% ✅ ospathsplitext_noaspectTime: ✅ 1.384µs (SLO: <10.000µs 📉 -86.2%) vs baseline: +0.4% Memory: ✅ 39.872MB (SLO: <41.000MB -2.8%) vs baseline: +4.9% 🟡 Near SLO Breach (1 suite)🟡 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.030µs (SLO: <20.000µs 📉 -84.9%) vs baseline: +3.5% Memory: ✅ 34.564MB (SLO: <35.500MB -2.6%) vs baseline: +5.0% ✅ 1-count-metrics-100-timesTime: ✅ 201.284µs (SLO: <220.000µs -8.5%) vs baseline: +0.7% Memory: ✅ 34.603MB (SLO: <35.500MB -2.5%) vs baseline: +4.9% ✅ 1-distribution-metric-1-timesTime: ✅ 3.325µs (SLO: <20.000µs 📉 -83.4%) vs baseline: +1.2% Memory: ✅ 34.564MB (SLO: <35.500MB -2.6%) vs baseline: +4.8% ✅ 1-distribution-metrics-100-timesTime: ✅ 214.721µs (SLO: <230.000µs -6.6%) vs baseline: +0.1% Memory: ✅ 34.583MB (SLO: <35.500MB -2.6%) vs baseline: +4.6% ✅ 1-gauge-metric-1-timesTime: ✅ 2.213µs (SLO: <20.000µs 📉 -88.9%) vs baseline: +1.2% Memory: ✅ 34.642MB (SLO: <35.500MB -2.4%) vs baseline: +5.0% ✅ 1-gauge-metrics-100-timesTime: ✅ 136.592µs (SLO: <150.000µs -8.9%) vs baseline: +0.1% Memory: ✅ 34.524MB (SLO: <35.500MB -2.7%) vs baseline: +4.6% ✅ 1-rate-metric-1-timesTime: ✅ 3.119µs (SLO: <20.000µs 📉 -84.4%) vs baseline: +2.4% Memory: ✅ 34.505MB (SLO: <35.500MB -2.8%) vs baseline: +4.6% ✅ 1-rate-metrics-100-timesTime: ✅ 216.704µs (SLO: <250.000µs 📉 -13.3%) vs baseline: +1.9% Memory: ✅ 34.564MB (SLO: <35.500MB -2.6%) vs baseline: +4.8% ✅ 100-count-metrics-100-timesTime: ✅ 20.195ms (SLO: <22.000ms -8.2%) vs baseline: -0.9% Memory: ✅ 34.544MB (SLO: <35.500MB -2.7%) vs baseline: +4.6% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.260ms (SLO: <2.300ms 🟡 -1.8%) vs baseline: -1.5% Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +5.2% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.405ms (SLO: <1.550ms -9.4%) vs baseline: ~same Memory: ✅ 34.662MB (SLO: <35.500MB -2.4%) vs baseline: +4.9% ✅ 100-rate-metrics-100-timesTime: ✅ 2.194ms (SLO: <2.550ms 📉 -14.0%) vs baseline: -0.7% Memory: ✅ 34.642MB (SLO: <35.500MB -2.4%) vs baseline: +4.9% ✅ flush-1-metricTime: ✅ 4.774µs (SLO: <20.000µs 📉 -76.1%) vs baseline: +7.9% Memory: ✅ 34.564MB (SLO: <35.500MB -2.6%) vs baseline: +4.8% ✅ flush-100-metricsTime: ✅ 174.303µs (SLO: <250.000µs 📉 -30.3%) vs baseline: +1.1% Memory: ✅ 34.603MB (SLO: <35.500MB -2.5%) vs baseline: +4.7% ✅ flush-1000-metricsTime: ✅ 2.135ms (SLO: <2.500ms 📉 -14.6%) vs baseline: +1.0% Memory: ✅ 35.448MB (SLO: <36.500MB -2.9%) vs baseline: +5.0%
|
334e0f5 to
703ce51
Compare
threading.Semaphore objects in Python Lock Profiler
threading.Semaphore objects in Python Lock Profilerthreading.Semaphore objects in Python Lock profiler
e4ffe7b to
36b7ed5
Compare
threading.Semaphore objects in Python Lock profilerthreading.Semaphore primitives in Python Lock profiler
threading.Semaphore primitives in Python Lock profilerthreading.Semaphore primitives in Python Lock profiler
threading.Semaphore primitives in Python Lock profilerthreading.Semaphore primitives with Python Lock profiler
d57fa82 to
ae2c3c9
Compare
ae2c3c9 to
2fd568c
Compare
https://datadoghq.atlassian.net/browse/PROF-12727
Description
This PR adds profiling support for
threading.Semaphoreto the Lock Profiler, including critical double-counting prevention when multiple lock collectors are active simultaneously.What
threading.Semaphoreusagethreading.pyinternalsWhy
Without direct Semaphore profiling:
threading.pyinternals (e.g.,Condition.__enter__at line 502)The double-counting problem:
Without prevention, one user operation generates multiple profile samples, making metrics inaccurate.
Changes
1. New Semaphore Collector (
ddtrace/profiling/collector/threading.py)2. Double-Counting Prevention (
ddtrace/profiling/collector/_lock.py)3. Drive-by fixes (
tests/profiling_v2/collector/test_threading.py)Testing
test_stack_trace_points_to_user_codetest_internal_lock_marked_correctlytest_no_double_counting_with_lock_collectorImplementation Notes
Semaphore → Condition → Lockchain is not visiblePerformance Impact
__slots__field (~8% increase from optimized version)sys._getframe(2)call, +1 path comparison (one-time per lock)if self.is_internalcheck (negligible)Risk
is_internaldetection (mitigated by testing)