Skip to content

OccurrenceViewSet list: N+1 queries on detections, best_prediction, best_identification #1271

@mihow

Description

@mihow

Problem

/api/v2/occurrences/ (list) issues N+1 queries per row in the response. Under low traffic this is mostly hidden; under concurrent multi-user load it can pin gunicorn workers for tens of seconds per request and stall every other endpoint sharing the pool.

Tracked in Sentry as AMI-PLATFORM-API-BMY (performance issue type: N+1, ongoing). Most-recent observation 2026-04-28; first seen 2025-09-16. ~2,800 events.

Evidence

Sentry's representative span shows the offending query repeated per occurrence row:

SELECT "main_detection"."path"
FROM "main_detection"
WHERE ("main_detection"."occurrence_id" = %s
       AND NOT ("main_detection"."path" IS NULL))
ORDER BY "main_detection"."frame_num" ASC, "main_detection"."timestamp" ASC

Same trace shows interleaved redis (cachalot/cache-key) calls per row, amplifying the cost.

In a recent multi-user load period (4-worker gunicorn pool, ~30 concurrent UI sessions), 158 hits to this endpoint clustered into a 10-minute window with individual span durations of 34–40 seconds (max 40,417 ms). With 4 workers, ~3 concurrent occurrences requests was enough to fully exhaust the pool. UI list/detail endpoints sharing the pool (/captures/, /jobs/, /deployments/{pk}/) backed up behind it; participants experienced "every page feels frozen" during this window.

Where

ami/main/api/views.py:1217OccurrenceViewSet.get_queryset:

def get_queryset(self) -> QuerySet["Occurrence"]:
    ...
    qs = qs.select_related("determination", "deployment", "event")
    qs = qs.with_detections_count().with_timestamps()
    qs = qs.with_identifications()
    ...
    if self.action != "list":
        qs = qs.prefetch_related(
            Prefetch("detections", queryset=Detection.objects.order_by("-timestamp").select_related("source_image"))
        )
    return qs

The detections prefetch is gated on self.action != "list", so the list path has no prefetch on detections. The list serializer (OccurrenceListSerializer, ami/main/api/serializers.py:1316) then triggers per-row queries through:

  • detection_images field → Occurrence.detection_images() (ami/main/models.py:3221) — does Detection.objects.filter(occurrence=self).exclude(path=None).values_list("path"). This is the exact query Sentry detected.
  • determination_details (ami/main/api/serializers.py:1367) — already has a # @TODO convert this to query methods to avoid N+1 queries. Currently at 100+ queries per page of 10 occurrences. comment.
  • best_machine_predictionOccurrence.best_prediction (models.py:3231) — runs self.predictions().order_by(...).first() per row.
  • identifications (many=True) — needs prefetch_related("identifications") on the list queryset; right now the comment block above suggests with_identifications() annotates a count but doesn't prefetch the related rows.

best_identification (models.py:3243) is called from get_determination_details and is another per-row hit.

Proposed fix

Make the list path explicitly prefetch what the list serializer touches. Sketch:

def get_queryset(self) -> QuerySet["Occurrence"]:
    ...
    qs = qs.select_related("determination", "deployment", "event")
    qs = qs.with_detections_count().with_timestamps()

    if self.action == "list":
        # Detections needed only for `detection_images` field — limit to the columns we read.
        qs = qs.prefetch_related(
            Prefetch(
                "detections",
                queryset=Detection.objects.exclude(path=None)
                                          .order_by("frame_num", "timestamp")
                                          .only("id", "occurrence_id", "path", "frame_num", "timestamp"),
                to_attr="prefetched_detections_for_images",
            ),
            "identifications",
            # Plus whatever predictions/best_prediction needs — likely:
            # Prefetch("detections__classifications", queryset=Classification.objects.order_by("-score").select_related("taxon", "algorithm")),
        )
        # Have `detection_images()` consume `prefetched_detections_for_images` when present
        # to avoid re-querying.
    else:
        qs = qs.prefetch_related(
            Prefetch("detections", queryset=Detection.objects.order_by("-timestamp").select_related("source_image"))
        )

    qs = qs.with_identifications()
    qs = qs.apply_default_filters(project, self.request)
    return qs

Plus update Occurrence.detection_images() to prefer the prefetched attribute when present, and audit best_prediction / best_identification for whether they can be replaced by query annotations or prefetches.

Verification

  • Under assertNumQueries, fetching a page of 25 occurrences should land at a small constant number of queries (single-digit), not 25 + N×detections + N×predictions + N×identifications.
  • Sentry's own profiles for this transaction (sampled at high rate on staging) should show view.response.render time drop sharply.

Related

  • Sentry issue ID: AMI-PLATFORM-API-BMY (internal)
  • Existing TODO comment in OccurrenceListSerializer.get_determination_details: "Currently at 100+ queries per page of 10 occurrences"
  • Issue Change determination_details to determination #252 ("Change determination_details to determination") touches the same surface; can be sequenced against this if convenient.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions