-
Notifications
You must be signed in to change notification settings - Fork 12
feat(exports): add machine prediction, verification, and detection fields to exports #1213
Description
Summary
Separate machine predictions from human identifications in exports and API so researchers see both side-by-side. Currently the export has a single determination that gets overwritten when a human verifies — losing the original ML prediction.
Design spec: docs/superpowers/specs/2026-04-07-export-fields-design.md (in ami-devops)
New Export Fields
Machine prediction fields
best_machine_prediction_name— taxon name from best Classificationbest_machine_prediction_algorithm— algorithm namebest_machine_prediction_score— confidence score
Verification fields
verified_by— username of best (most recent non-withdrawn) identification's userverified_by_count— count of non-withdrawn identificationsagreed_with_algorithm— algorithm name if human explicitly agreed with an ML predictiondetermination_matches_machine_prediction— boolean: does determination taxon match best prediction taxon?
Detection/capture fields
best_detection_bbox— raw[x1, y1, x2, y2]best_detection_source_image_url— public URL to original capture imagebest_detection_occurrence_url— platform UI link to occurrence in context
Modifications
determination_score→ set tonullwhen determination comes from a human ID (ML score preserved inbest_machine_prediction_score)
API Changes
Add best_machine_prediction nested object to OccurrenceListSerializer — always populated regardless of verification status.
Implementation Plan
Step 1: Refactor update_occurrence_determination() (model layer)
File: ami/main/models.py
- Extract
find_best_prediction()→ Occurrence method returning best Classification (terminal-first, highest score) - Extract
find_best_identification()→ Occurrence method returning most recent non-withdrawn Identification - Update
update_occurrence_determination()to call both; setdetermination_score = Nonefor human IDs - Have existing
best_predictioncached_property delegate tofind_best_prediction()
Step 2: Add export queryset annotations
File: ami/main/models.py — OccurrenceExportManager
Extend existing subquery annotation pattern:
best_machine_prediction_name— Subquery: Classification → Taxon.name (ordered by -terminal, -score)best_machine_prediction_algorithm— Subquery: Classification → Algorithm.namebest_machine_prediction_score— Subquery: Classification.scorebest_identification_user— Subquery: Identification → User.username (most recent non-withdrawn)verified_by_count— Count of non-withdrawn Identificationsbest_detection_bbox— Subquery: Detection.bboxbest_detection_source_image_path+public_base_url— raw values for URL computation in serializer
Step 3: Add fields to OccurrenceTabularSerializer
File: ami/exports/format_types.py
Add all new fields to the CSV/tabular serializer using the queryset annotations from step 2. Compute best_detection_source_image_url and best_detection_occurrence_url in serializer methods.
Step 4: Add best_machine_prediction to API serializer
File: ami/main/api/serializers.py
Add best_machine_prediction nested field to OccurrenceListSerializer using find_best_prediction().
Step 5: Tests
File: ami/exports/tests.py, API tests
- ML prediction only → fields populated, verified_by null
- ML + agreeing human ID → verified_by set, determination_matches = true, determination_score = null
- ML + disagreeing human ID → determination_matches = false
- Multiple identifications → verified_by_count correct
- bbox and URL fields populated
- API: best_machine_prediction persists after human verification
Step 6: Management command for backfill
- Create command to run
update_occurrence_determination()on all occurrences (backfills determination_score = null for human IDs)
Follow-up TODOs
- Verify/add data integrity check for stale determinations
- Run backfill management command on production
🤖 Generated with Claude Code