Skip to content

feat(exports): add machine prediction, verification, and detection fields to exports #1213

@mihow

Description

@mihow

Summary

Separate machine predictions from human identifications in exports and API so researchers see both side-by-side. Currently the export has a single determination that gets overwritten when a human verifies — losing the original ML prediction.

Design spec: docs/superpowers/specs/2026-04-07-export-fields-design.md (in ami-devops)

New Export Fields

Machine prediction fields

  • best_machine_prediction_name — taxon name from best Classification
  • best_machine_prediction_algorithm — algorithm name
  • best_machine_prediction_score — confidence score

Verification fields

  • verified_by — username of best (most recent non-withdrawn) identification's user
  • verified_by_count — count of non-withdrawn identifications
  • agreed_with_algorithm — algorithm name if human explicitly agreed with an ML prediction
  • determination_matches_machine_prediction — boolean: does determination taxon match best prediction taxon?

Detection/capture fields

  • best_detection_bbox — raw [x1, y1, x2, y2]
  • best_detection_source_image_url — public URL to original capture image
  • best_detection_occurrence_url — platform UI link to occurrence in context

Modifications

  • determination_score → set to null when determination comes from a human ID (ML score preserved in best_machine_prediction_score)

API Changes

Add best_machine_prediction nested object to OccurrenceListSerializer — always populated regardless of verification status.

Implementation Plan

Step 1: Refactor update_occurrence_determination() (model layer)

File: ami/main/models.py

  • Extract find_best_prediction() → Occurrence method returning best Classification (terminal-first, highest score)
  • Extract find_best_identification() → Occurrence method returning most recent non-withdrawn Identification
  • Update update_occurrence_determination() to call both; set determination_score = None for human IDs
  • Have existing best_prediction cached_property delegate to find_best_prediction()

Step 2: Add export queryset annotations

File: ami/main/models.pyOccurrenceExportManager

Extend existing subquery annotation pattern:

  • best_machine_prediction_name — Subquery: Classification → Taxon.name (ordered by -terminal, -score)
  • best_machine_prediction_algorithm — Subquery: Classification → Algorithm.name
  • best_machine_prediction_score — Subquery: Classification.score
  • best_identification_user — Subquery: Identification → User.username (most recent non-withdrawn)
  • verified_by_count — Count of non-withdrawn Identifications
  • best_detection_bbox — Subquery: Detection.bbox
  • best_detection_source_image_path + public_base_url — raw values for URL computation in serializer

Step 3: Add fields to OccurrenceTabularSerializer

File: ami/exports/format_types.py

Add all new fields to the CSV/tabular serializer using the queryset annotations from step 2. Compute best_detection_source_image_url and best_detection_occurrence_url in serializer methods.

Step 4: Add best_machine_prediction to API serializer

File: ami/main/api/serializers.py

Add best_machine_prediction nested field to OccurrenceListSerializer using find_best_prediction().

Step 5: Tests

File: ami/exports/tests.py, API tests

  • ML prediction only → fields populated, verified_by null
  • ML + agreeing human ID → verified_by set, determination_matches = true, determination_score = null
  • ML + disagreeing human ID → determination_matches = false
  • Multiple identifications → verified_by_count correct
  • bbox and URL fields populated
  • API: best_machine_prediction persists after human verification

Step 6: Management command for backfill

  • Create command to run update_occurrence_determination() on all occurrences (backfills determination_score = null for human IDs)

Follow-up TODOs

  • Verify/add data integrity check for stale determinations
  • Run backfill management command on production

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions