-
Notifications
You must be signed in to change notification settings - Fork 0
feat(upsampling) - Support upsampled error count with performance optimizations #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…(#94376) Part of the Error Upsampling project: https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link Events-stats API will now check if all projects in the query are allowlisted for upsampling, and convert the count query to a sum over `sample_weight` in Snuba, this is done by defining a new SnQL function `upsampled_count()`. I noticed there are also eps() and epm() functions in use in this endpoint. I considered (and even worked on) also supporting swapping eps() and epm() which for correctness should probably also not count naively and use `sample_weight`, but this caused some complications and since they are only in use by specific dashboard widgets and not available in discover I decided to defer changing them until we realize it is needed.
- Add 60-second cache for upsampling eligibility checks to improve performance - Separate upsampling eligibility check from query transformation for better optimization - Remove unnecessary null checks in upsampled_count() function per schema requirements - Add cache invalidation utilities for configuration management This improves performance during high-traffic periods by avoiding repeated expensive allowlist lookups while maintaining data consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements support for upsampled error count aggregations with performance optimizations. The feature allows error events to be counted using sample weights when all projects in a query are allowlisted for error upsampling, providing more accurate counts for sampled error data.
- Adds error upsampling helper module with caching and eligibility checks
- Transforms
count()aggregations toupsampled_count()for eligible error queries - Includes comprehensive test coverage for various upsampling scenarios
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/sentry/api/helpers/error_upsampling.py |
Core error upsampling logic with caching and transformation functions |
src/sentry/api/endpoints/organization_events_stats.py |
Integration of upsampling checks and column transformations |
src/sentry/search/events/datasets/discover.py |
New upsampled_count SnQL function definition |
src/sentry/testutils/factories.py |
Sample rate extraction from error sampling context |
tests/sentry/api/helpers/test_error_upsampling.py |
Unit tests for upsampling helper functions |
tests/snuba/api/endpoints/test_organization_events_stats.py |
Integration tests for upsampling behavior |
pyproject.toml |
Type checking configuration for new modules |
sentry-repo |
Git subproject commit reference |
Comments suppressed due to low confidence (1)
tests/snuba/api/endpoints/test_organization_events_stats.py:568
- This line overwrites the
self.userattribute that was set in the parent setUp() method (line 559), which could cause issues with authentication. Theself.authed_useris set to reference the original user, but this reassignment might cause confusion.
)
| ) | ||
| elif top_events <= 0: | ||
| return Response({"detail": "If topEvents needs to be at least 1"}, status=400) | ||
| return Response({"detail": "topEvents needs to be at least 1"}, status=400) |
Copilot
AI
Jul 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed typo in error message - removed duplicate 'If' from 'If topEvents needs to be at least 1'.
| expensive repeated option lookups during high-traffic periods. This is safe | ||
| because allowlist changes are infrequent and eventual consistency is acceptable. | ||
| """ | ||
| cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}" |
Copilot
AI
Jul 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using hash() function for cache keys can be problematic as Python's hash() function is not guaranteed to be consistent across different Python processes or restarts. This could lead to cache misses and reduced performance. Consider using a deterministic hashing approach like hashlib.md5() or a simple string concatenation.
| cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}" | |
| sorted_project_ids = ",".join(map(str, sorted(snuba_params.project_ids))) | |
| cache_key = f"error_upsampling_eligible:{organization.id}:{hashlib.md5(sorted_project_ids.encode()).hexdigest()}" |
| This should be called when the allowlist configuration changes to ensure | ||
| cache consistency across the system. | ||
| """ | ||
| cache_key = f"error_upsampling_eligible:{organization_id}:{hash(tuple(sorted(project_ids)))}" |
Copilot
AI
Jul 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same hash() consistency issue as above - this cache key should use a deterministic hashing method to ensure consistent cache behavior across processes.
| cache_key = f"error_upsampling_eligible:{organization_id}:{hash(tuple(sorted(project_ids)))}" | |
| import hashlib | |
| project_ids_hash = hashlib.md5(str(tuple(sorted(project_ids))).encode('utf-8')).hexdigest() | |
| cache_key = f"error_upsampling_eligible:{organization_id}:{project_ids_hash}" |
| upsampling_enabled = should_upsample | ||
| final_columns = query_columns | ||
|
|
||
| if top_events > 0: | ||
| # Apply upsampling transformation just before query execution | ||
| # This late transformation ensures we use the most current schema assumptions | ||
| if upsampling_enabled: |
Copilot
AI
Jul 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable upsampling_enabled is assigned but immediately reassigned from should_upsample without any transformation. This creates unnecessary variable duplication. Consider using should_upsample directly throughout the function.
| upsampling_enabled = should_upsample | |
| final_columns = query_columns | |
| if top_events > 0: | |
| # Apply upsampling transformation just before query execution | |
| # This late transformation ensures we use the most current schema assumptions | |
| if upsampling_enabled: | |
| final_columns = query_columns | |
| if top_events > 0: | |
| # Apply upsampling transformation just before query execution | |
| # This late transformation ensures we use the most current schema assumptions | |
| if should_upsample: |
Test 3