-
Notifications
You must be signed in to change notification settings - Fork 469
feat(crashtracking): emit runtime stack trace for crashtracker #14765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
b815a85 to
478739b
Compare
This stack of pull requests is managed by Graphite. Learn more about stacking. |
0406f56 to
59ff734
Compare
c86a334 to
f636d12
Compare
8a02b7e to
3037918
Compare
7be8ae0 to
8bbcb31
Compare
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 220 ± 4 ms. The average import time from base is: 222 ± 4 ms. The import time difference between this PR and base is: -1.5 ± 0.2 ms. Import time breakdownThe following import paths have shrunk:
|
e2ee9e6 to
20eb6f8
Compare
Performance SLOsComparing candidate gyuheon0h/prof-12661-runtime-stacks (9503007) with baseline main (8851ec9) 📈 Performance Regressions (2 suites)📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 5.249µs (SLO: <10.000µs 📉 -47.5%) vs baseline: 📈 +22.0% Memory: ✅ 40.187MB (SLO: <41.000MB 🟡 -2.0%) vs baseline: +4.4% ✅ ospathbasename_noaspectTime: ✅ 1.080µs (SLO: <10.000µs 📉 -89.2%) vs baseline: -0.6% Memory: ✅ 40.069MB (SLO: <41.000MB -2.3%) vs baseline: +4.3% ✅ ospathjoin_aspectTime: ✅ 6.205µs (SLO: <10.000µs 📉 -38.0%) vs baseline: +0.6% Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.7% ✅ ospathjoin_noaspectTime: ✅ 2.287µs (SLO: <10.000µs 📉 -77.1%) vs baseline: -0.5% Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.8% ✅ ospathnormcase_aspectTime: ✅ 3.537µs (SLO: <10.000µs 📉 -64.6%) vs baseline: ~same Memory: ✅ 40.226MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.5% ✅ ospathnormcase_noaspectTime: ✅ 0.569µs (SLO: <10.000µs 📉 -94.3%) vs baseline: -0.2% Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +5.1% ✅ ospathsplit_aspectTime: ✅ 4.862µs (SLO: <10.000µs 📉 -51.4%) vs baseline: ~same Memory: ✅ 40.167MB (SLO: <41.000MB -2.0%) vs baseline: +4.7% ✅ ospathsplit_noaspectTime: ✅ 1.598µs (SLO: <10.000µs 📉 -84.0%) vs baseline: -0.3% Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.6% ✅ ospathsplitdrive_aspectTime: ✅ 3.761µs (SLO: <10.000µs 📉 -62.4%) vs baseline: +0.9% Memory: ✅ 40.305MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.9% ✅ ospathsplitdrive_noaspectTime: ✅ 0.703µs (SLO: <10.000µs 📉 -93.0%) vs baseline: +0.3% Memory: ✅ 40.206MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.3% ✅ ospathsplitext_aspectTime: ✅ 4.635µs (SLO: <10.000µs 📉 -53.7%) vs baseline: +1.0% Memory: ✅ 40.442MB (SLO: <41.000MB 🟡 -1.4%) vs baseline: +4.9% ✅ ospathsplitext_noaspectTime: ✅ 1.384µs (SLO: <10.000µs 📉 -86.2%) vs baseline: -0.1% Memory: ✅ 40.147MB (SLO: <41.000MB -2.1%) vs baseline: +4.6% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.390µs (SLO: <20.000µs 📉 -83.1%) vs baseline: 📈 +14.7% Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +4.0% ✅ 1-count-metrics-100-timesTime: ✅ 204.001µs (SLO: <220.000µs -7.3%) vs baseline: ~same Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.6% ✅ 1-distribution-metric-1-timesTime: ✅ 3.320µs (SLO: <20.000µs 📉 -83.4%) vs baseline: +0.2% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.5% ✅ 1-distribution-metrics-100-timesTime: ✅ 218.440µs (SLO: <230.000µs -5.0%) vs baseline: +0.3% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +3.6% ✅ 1-gauge-metric-1-timesTime: ✅ 2.191µs (SLO: <20.000µs 📉 -89.0%) vs baseline: -1.4% Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +3.7% ✅ 1-gauge-metrics-100-timesTime: ✅ 137.238µs (SLO: <150.000µs -8.5%) vs baseline: -0.3% Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.5% ✅ 1-rate-metric-1-timesTime: ✅ 3.084µs (SLO: <20.000µs 📉 -84.6%) vs baseline: -1.3% Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +4.5% ✅ 1-rate-metrics-100-timesTime: ✅ 216.242µs (SLO: <250.000µs 📉 -13.5%) vs baseline: -0.8% Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.4% ✅ 100-count-metrics-100-timesTime: ✅ 20.490ms (SLO: <22.000ms -6.9%) vs baseline: +0.5% Memory: ✅ 34.701MB (SLO: <35.500MB -2.2%) vs baseline: +3.4% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.305ms (SLO: <2.300ms +0.2%) vs baseline: +1.7% Memory: ✅ 35.055MB (SLO: <35.500MB 🟡 -1.3%) vs baseline: +4.4% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.423ms (SLO: <1.550ms -8.2%) vs baseline: ~same Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +3.4% ✅ 100-rate-metrics-100-timesTime: ✅ 2.241ms (SLO: <2.550ms 📉 -12.1%) vs baseline: +1.1% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +3.7% ✅ flush-1-metricTime: ✅ 4.567µs (SLO: <20.000µs 📉 -77.2%) vs baseline: +3.5% Memory: ✅ 35.075MB (SLO: <35.500MB 🟡 -1.2%) vs baseline: +4.4% ✅ flush-100-metricsTime: ✅ 174.235µs (SLO: <250.000µs 📉 -30.3%) vs baseline: +0.6% Memory: ✅ 35.154MB (SLO: <35.500MB 🟡 -1.0%) vs baseline: +4.5% ✅ flush-1000-metricsTime: ✅ 2.181ms (SLO: <2.500ms 📉 -12.8%) vs baseline: -0.4% Memory: ✅ 35.960MB (SLO: <36.500MB 🟡 -1.5%) vs baseline: +4.9% 🟡 Near SLO Breach (15 suites)🟡 coreapiscenario - 10/10 (1 unstable)
|
dfe9efb to
7cc3ff2
Compare
35d0900 to
a58affc
Compare
a58affc to
493c7dd
Compare
493c7dd to
055a5f2
Compare
8d59e03 to
f4cc9b4
Compare
|
The system tests failures are real ValueError: (NOT A FLAKE) Read this quick runbook to update allowed configs: https://github.com/DataDog/system-tests/blob/main/docs/edit/runbook.md#test_config_telemetry_completeness and telemetry tests too, you probably just need to add a line adding new env var with default value |
3d7e822 to
36a54fd
Compare
36a54fd to
a5f39bf
Compare
Synced offline to get this in first for dogfooding, and deal with this post code freeze |

Description
We want to emit runtime stacks and include this information in crashtracker crash report. CPython has a signal safe private internal API
_Py_DumpTracebackThreads, that we can try calling to obtain a traceback string. However, this symbol has been removed from public internal API for Python versions 3.13+ and are not found in CI libpython so files for versions < 3.11. Follow up work will be done to support all Python versions.This feature is controlled by
DD_CRASHTRACKING_EMIT_RUNTIME_STACKSTesting
Unit tests
3.12; available: https://dd.datad0g.com/logs?query=is_crash%3Atrue&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice&event=AwAAAZqYqevuEuvgsAAAABhBWnFZcWV2dUFBQkJSUkNxc1ZoUEFBQUEAAAAkMTE5YTk4ZDUtYTkyZi00NGVlLWFhMjMtYjg3NGMzMTgwN2FjAAEu0Q&fromUser=true&messageDisplay=inline&refresh_mode=sliding&storage=hot&stream_sort=desc&viz=stream&from_ts=1763496062789&to_ts=1763510462789&live=true
3.9; not available: https://dd.datad0g.com/logs?query=is_crash%3Atrue&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice&event=AwAAAZq3ClRpkN1ueAAAABhBWnEzQ2xScEFBQmRMTUlHY3ZwLW1nQUMAAAAkZjE5YWI3MGEtODUzZC00YmE5LTk5MWQtMmJjMzZiNjNhMThlAAABkg&fromUser=true&messageDisplay=inline&refresh_mode=sliding&storage=hot&stream_sort=desc&viz=stream&from_ts=1764006670175&to_ts=1764007570175&live=true
Risks
Additional Notes