Skip to content

Conversation

@akshayutture-augment
Copy link

yuvmen and others added 2 commits July 25, 2025 09:48
…(#94376)

Part of the Error Upsampling project:
https://www.notion.so/sentry/Tech-Spec-Error-Up-Sampling-1e58b10e4b5d80af855cf3b992f75894?source=copy_link

Events-stats API will now check if all projects in the query are
allowlisted for upsampling, and convert the count query to a sum over
`sample_weight` in Snuba, this is done by defining a new SnQL function
`upsampled_count()`.

I noticed there are also eps() and epm() functions in use in this
endpoint. I considered (and even worked on) also supporting
swapping eps() and epm() which for correctness should probably also not
count naively and use `sample_weight`, but this
caused some complications and since they are only in use by specific
dashboard widgets and not available in discover
I decided to defer changing them until we realize it is needed.
- Add 60-second cache for upsampling eligibility checks to improve performance
- Separate upsampling eligibility check from query transformation for better optimization
- Remove unnecessary null checks in upsampled_count() function per schema requirements
- Add cache invalidation utilities for configuration management

This improves performance during high-traffic periods by avoiding repeated
expensive allowlist lookups while maintaining data consistency.
Copilot AI review requested due to automatic review settings November 14, 2025 22:42
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +226 to +232
upsampling_enabled = should_upsample
final_columns = query_columns

if top_events > 0:
# Apply upsampling transformation just before query execution
# This late transformation ensures we use the most current schema assumptions
if upsampling_enabled:
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable upsampling_enabled is redundant as it simply copies the value of should_upsample. Consider using should_upsample directly throughout the function to reduce unnecessary variable assignments.

Suggested change
upsampling_enabled = should_upsample
final_columns = query_columns
if top_events > 0:
# Apply upsampling transformation just before query execution
# This late transformation ensures we use the most current schema assumptions
if upsampling_enabled:
final_columns = query_columns
if top_events > 0:
# Apply upsampling transformation just before query execution
# This late transformation ensures we use the most current schema assumptions
if should_upsample:

Copilot uses AI. Check for mistakes.
# Store the upsampling decision to apply later during query building
# This separation allows for better query optimization and caching
upsampling_enabled = should_upsample
final_columns = query_columns
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing final_columns to query_columns creates an unnecessary assignment. Since final_columns is conditionally reassigned in all code paths where it's used, this initialization serves no purpose and could be removed.

Suggested change
final_columns = query_columns

Copilot uses AI. Check for mistakes.
Comment on lines +230 to +233
# Apply upsampling transformation just before query execution
# This late transformation ensures we use the most current schema assumptions
if upsampling_enabled:
final_columns = transform_query_columns_for_error_upsampling(query_columns)
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transformation logic is duplicated three times (lines 232-233, 276-277, and 295-296). Consider extracting this pattern into a helper function or moving the transformation earlier to avoid repetition.

Copilot uses AI. Check for mistakes.
expensive repeated option lookups during high-traffic periods. This is safe
because allowlist changes are infrequent and eventual consistency is acceptable.
"""
cache_key = f"error_upsampling_eligible:{organization.id}:{hash(tuple(sorted(snuba_params.project_ids)))}"
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache key construction uses Python's built-in hash() which is not stable across process restarts and can vary between Python versions. Consider using a stable hash function like hashlib.md5() or hashlib.sha256() to ensure cache consistency across different processes and deployments.

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +64
result = all(project_id in allowlist for project_id in project_ids)
return result
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The intermediate variable result is unnecessary. The function can directly return the result of the all() expression.

Suggested change
result = all(project_id in allowlist for project_id in project_ids)
return result
return all(project_id in allowlist for project_id in project_ids)

Copilot uses AI. Check for mistakes.
Comment on lines +123 to +124
result = _is_error_focused_query(request)
return result
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The intermediate variable result is unnecessary. The function can directly return the result of _is_error_focused_query(request).

Suggested change
result = _is_error_focused_query(request)
return result
return _is_error_focused_query(request)

Copilot uses AI. Check for mistakes.
return transformed_columns


def _should_apply_sample_weight_transform(dataset: Any, request: Request) -> bool:
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataset parameter is typed as Any which defeats the purpose of type hints. Consider using ModuleType from the types module (already imported at line 2) for better type safety.

Suggested change
def _should_apply_sample_weight_transform(dataset: Any, request: Request) -> bool:
def _should_apply_sample_weight_transform(dataset: ModuleType, request: Request) -> bool:

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants