-
Notifications
You must be signed in to change notification settings - Fork 0
feat(upsampling) - Support upsampled error count with performance optimizations #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…(#94376) Part of the Error Upsampling project: https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link Events-stats API will now check if all projects in the query are allowlisted for upsampling, and convert the count query to a sum over `sample_weight` in Snuba, this is done by defining a new SnQL function `upsampled_count()`. I noticed there are also eps() and epm() functions in use in this endpoint. I considered (and even worked on) also supporting swapping eps() and epm() which for correctness should probably also not count naively and use `sample_weight`, but this caused some complications and since they are only in use by specific dashboard widgets and not available in discover I decided to defer changing them until we realize it is needed.
- Add 60-second cache for upsampling eligibility checks to improve performance - Separate upsampling eligibility check from query transformation for better optimization - Remove unnecessary null checks in upsampled_count() function per schema requirements - Add cache invalidation utilities for configuration management This improves performance during high-traffic periods by avoiding repeated expensive allowlist lookups while maintaining data consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| upsampling_enabled = should_upsample | ||
| final_columns = query_columns | ||
|
|
||
| if top_events > 0: | ||
| # Apply upsampling transformation just before query execution | ||
| # This late transformation ensures we use the most current schema assumptions | ||
| if upsampling_enabled: |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable upsampling_enabled is redundant as it simply copies the value of should_upsample. Consider using should_upsample directly throughout the function to reduce unnecessary variable assignments.
| upsampling_enabled = should_upsample | |
| final_columns = query_columns | |
| if top_events > 0: | |
| # Apply upsampling transformation just before query execution | |
| # This late transformation ensures we use the most current schema assumptions | |
| if upsampling_enabled: | |
| final_columns = query_columns | |
| if top_events > 0: | |
| # Apply upsampling transformation just before query execution | |
| # This late transformation ensures we use the most current schema assumptions | |
| if should_upsample: |
| # Store the upsampling decision to apply later during query building | ||
| # This separation allows for better query optimization and caching | ||
| upsampling_enabled = should_upsample | ||
| final_columns = query_columns |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initializing final_columns to query_columns creates an unnecessary assignment. Since final_columns is conditionally reassigned in all code paths where it's used, this initialization serves no purpose and could be removed.
| final_columns = query_columns |
| # Apply upsampling transformation just before query execution | ||
| # This late transformation ensures we use the most current schema assumptions | ||
| if upsampling_enabled: | ||
| final_columns = transform_query_columns_for_error_upsampling(query_columns) |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transformation logic is duplicated three times (lines 232-233, 276-277, and 295-296). Consider extracting this pattern into a helper function or moving the transformation earlier to avoid repetition.
| expensive repeated option lookups during high-traffic periods. This is safe | ||
| because allowlist changes are infrequent and eventual consistency is acceptable. | ||
| """ | ||
| cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}" |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache key construction uses Python's built-in hash() which is not stable across process restarts and can vary between Python versions. Consider using a stable hash function like hashlib.md5() or hashlib.sha256() to ensure cache consistency across different processes and deployments.
| result = all(project_id in allowlist for project_id in project_ids) | ||
| return result |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The intermediate variable result is unnecessary. The function can directly return the result of the all() expression.
| result = all(project_id in allowlist for project_id in project_ids) | |
| return result | |
| return all(project_id in allowlist for project_id in project_ids) |
| result = _is_error_focused_query(request) | ||
| return result |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The intermediate variable result is unnecessary. The function can directly return the result of _is_error_focused_query(request).
| result = _is_error_focused_query(request) | |
| return result | |
| return _is_error_focused_query(request) |
| return transformed_columns | ||
|
|
||
|
|
||
| def _should_apply_sample_weight_transform(dataset: Any, request: Request) -> bool: |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataset parameter is typed as Any which defeats the purpose of type hints. Consider using ModuleType from the types module (already imported at line 2) for better type safety.
| def _should_apply_sample_weight_transform(dataset: Any, request: Request) -> bool: | |
| def _should_apply_sample_weight_transform(dataset: ModuleType, request: Request) -> bool: |
Test 3
Replicated from ai-code-review-evaluation/sentry-copilot#2