feat: add Darwin Core Archive (DwC-A) export format by mihow · Pull Request #1131 · RolnickLab/antenna

mihow · 2026-02-11T20:05:39Z

Summary

Add a new dwca export format that produces a GBIF-compatible Darwin Core Archive (ZIP containing event.txt, occurrence.txt, meta.xml, eml.xml). Uses Event Core architecture: events are the core entity, occurrences are an extension linked via eventID. Field mappings, metadata generators, and taxonomy helpers live in a new ami/exports/dwca.py module, registered via the existing export registry and available immediately through the exports API.

This is the foundation for publishing AMI biodiversity data to GBIF. The initial implementation produces static archive downloads. Future work will add richer metadata, multimedia extensions, and eventually automated publishing via IPT integration.

List of Changes

Added ami/exports/dwca.py — DwC field definitions, TSV writer, meta.xml/eml.xml generators, taxonomy helpers
Added DwCAExporter(BaseExporter) class in ami/exports/format_types.py
Registered "dwca" format in ami/exports/registry.py
Added 10 tests in ami/exports/tests.py covering ZIP structure, headers, row counts, meta.xml, referential integrity, taxonomy extraction, EML validity, and collection filtering
Added docs/claude/dwca-format-reference.md — DwC-A format reference
Extended docs/claude/export-framework.md with API reference, DwC-A specifics, and operations guide

Related Issues

Detailed Description

Archive structure

project_export.zip
├── meta.xml          ← column → DwC term URI mappings
├── eml.xml           ← dataset metadata (EML 2.1.1)
├── event.txt         ← Tab-separated Event Core (one row per sampling event)
└── occurrence.txt    ← Tab-separated Occurrence Extension (one row per occurrence)

Field mappings: Event Core (`event.txt`)

Each row = one AMI Event (a single trap/camera night).

Column	DwC Term	Antenna Source	Notes
`eventID`	`dwc:eventID`	`"urn:ami:event:{project_slug}:{event.id}"`	Globally unique URN
`eventDate`	`dwc:eventDate`	`event.start` / `event.end`	ISO 8601 date or interval (`2024-06-15/2024-06-16`)
`eventTime`	`dwc:eventTime`	`event.start.time()`	HH:MM:SS
`year`	`dwc:year`	`event.start.year`
`month`	`dwc:month`	`event.start.month`
`day`	`dwc:day`	`event.start.day`
`samplingProtocol`	`dwc:samplingProtocol`	`"automated light trap with camera"`	Project-level constant
`sampleSizeValue`	`dwc:sampleSizeValue`	`event.captures_count`	Number of images captured
`sampleSizeUnit`	`dwc:sampleSizeUnit`	`"images"`
`samplingEffort`	`dwc:samplingEffort`	`event.end - event.start`	Formatted as `"12h 30m"`
`locationID`	`dwc:locationID`	`event.deployment.name`	Trap/deployment name
`decimalLatitude`	`dwc:decimalLatitude`	`event.deployment.latitude`	6 decimal places
`decimalLongitude`	`dwc:decimalLongitude`	`event.deployment.longitude`	6 decimal places
`geodeticDatum`	`dwc:geodeticDatum`	`"WGS84"`	Constant
`datasetName`	`dwc:datasetName`	`event.project.name`
`modified`	`dc:modified`	`event.updated_at`	ISO 8601 datetime

Field mappings: Occurrence Extension (`occurrence.txt`)

Each row = one AMI Occurrence. Only occurrences with a valid event and a determination are included (orphaned or undetermined records are excluded).

Column	DwC Term	Antenna Source	Notes
`eventID`	`dwc:eventID`	`"urn:ami:event:{slug}:{occurrence.event_id}"`	Links to event core
`occurrenceID`	`dwc:occurrenceID`	`"urn:ami:occurrence:{slug}:{occurrence.id}"`	Globally unique URN
`basisOfRecord`	`dwc:basisOfRecord`	`"MachineObservation"`	All records are ML-detected
`occurrenceStatus`	`dwc:occurrenceStatus`	`"present"`	AMI does not record absences
`scientificName`	`dwc:scientificName`	`occurrence.determination.name`	Taxon name from best determination
`taxonRank`	`dwc:taxonRank`	`occurrence.determination.rank.lower()`	e.g. `"species"`, `"genus"`
`kingdom`	`dwc:kingdom`	`determination.parents_json`	Walked from precomputed parent chain
`phylum`	`dwc:phylum`	`determination.parents_json`
`class`	`dwc:class`	`determination.parents_json`
`order`	`dwc:order`	`determination.parents_json`
`family`	`dwc:family`	`determination.parents_json`
`genus`	`dwc:genus`	`determination.parents_json`
`specificEpithet`	`dwc:specificEpithet`	Second word of `scientificName`	e.g. `"plana"` from `"Adaina plana"`
`vernacularName`	`dwc:vernacularName`	`determination.common_name_en`	English common name
`taxonID`	`dwc:taxonID`	`determination.gbif_taxon_key`	GBIF backbone taxon key
`individualCount`	`dwc:individualCount`	`"1"`	Always 1 — each occurrence = one individual
`identificationVerificationStatus`	`dwc:identificationVerificationStatus`	`"verified"` / `"unverified"`	Based on presence of human identifications
`modified`	`dc:modified`	`occurrence.updated_at`	ISO 8601 datetime

Key design decisions

Decision	Rationale
Event Core (not Occurrence Core)	AMI records sampling effort (trap nights) — Event Core is correct for sampling-event datasets
`basisOfRecord = "MachineObservation"`	All identifications are ML pipeline outputs
Taxonomy from `parents_json`	Pre-computed parent chain avoids N+1 queries; no additional DB joins needed
`individualCount = 1`	Each `Occurrence` represents one individual in AMI's model; `detections_count` is bounding boxes across frames, not individuals
URN-format IDs	`urn:ami:event:{slug}:{id}` — globally unique, stable, and human-readable
Events derived from filtered occurrences	`get_events_queryset()` builds event list from `self.queryset.values_list("event_id")` — ensures event core and occurrence extension are always in sync when filters are applied
Direct TSV writing	DwC fields are flat extractions, not nested API representations; avoids DRF serializer overhead
`queryset.iterator(chunk_size=500)`	Memory-efficient streaming for large exports
EML 2.1.1	Current GBIF-accepted version; upgrade to 2.2.0 is a follow-up item

Near follow-up (before real GBIF submission)

Apply project default quality filters (min confidence threshold) — without this, low-confidence ML determinations get published
Add license field on events — GBIF requires a dataset license for reuse terms
Add identifiedBy / dateIdentified — provenance for who/what made the determination
Add associatedMedia — detection image URLs
Runtime validation before packaging — check for missing required fields, orphaned references
Multimedia extension file (multimedia.txt)
Upgrade EML to 2.2.0

Eventual follow-up

EML geographic/temporal coverage computed from actual data
country, stateProvince, locality on events (requires reverse geocoding)
coordinateUncertaintyInMeters, institutionCode, collectionCode
GBIF Data Validator automated integration test
IPT server integration

How to Test the Changes

Prerequisites

A running Antenna instance with at least one project containing events and ML-identified occurrences
API access (JWT or session auth) with Researcher or Admin role

1 · Create an export via the API

curl -X POST https://<host>/api/v2/exports/ \
  -H "Authorization: ******" \
  -H "Content-Type: application/json" \
  -d '{"project": <project_id>, "format": "dwca"}'

Note the id from the response, then poll until file_url is non-null:

curl https://<host>/api/v2/exports/<export_id>/ -H "Authorization: ******"

2 · Inspect the ZIP

curl -o export.zip "<file_url>"
unzip -l export.zip
# Expected: event.txt, occurrence.txt, meta.xml, eml.xml

unzip -p export.zip event.txt | head -3
unzip -p export.zip occurrence.txt | head -3
unzip -p export.zip meta.xml
unzip -p export.zip eml.xml

3 · Validate with GBIF Data Validator

Upload the ZIP to https://www.gbif.org/tools/data-validator. Expected: no critical errors.

4 · Spot-check referential integrity

unzip -p export.zip event.txt | cut -f1 | sort > /tmp/event_ids.txt
unzip -p export.zip occurrence.txt | tail -n +2 | cut -f1 | sort | uniq > /tmp/coreid_ids.txt
comm -23 /tmp/coreid_ids.txt /tmp/event_ids.txt
# → should produce no output (all coreids matched)

5 · Test with collection filter

curl -X POST https://<host>/api/v2/exports/ \
  -H "Authorization: ******" \
  -H "Content-Type: application/json" \
  -d '{"project": <project_id>, "format": "dwca", "filters": {"collection_id": <col_id>}}'

6 · Run automated tests

python manage.py test ami.exports.tests.DwCAExporterTests --verbosity=2

All 10 DwC-A tests should pass (ZIP structure, headers, row counts, meta.xml, referential integrity, taxonomy, EML validity, collection filtering).

Screenshots

N/A — backend export format, no UI changes.

Deployment Notes

No migrations or config changes required. The "dwca" format is registered automatically and appears in the API's valid format choices immediately on deployment.

Checklist

I have tested these changes appropriately.
I have added and/or modified relevant tests.
I updated relevant documentation or comments.
I have verified that this PR follows the project's coding standards.
Any dependent changes have already been merged to main.

Add planning document for Darwin Core Archive export format, export framework technical reference, DwC-A format reference with field mappings, and downloaded DwC terms quick reference from TDWG. Co-Authored-By: Claude <noreply@anthropic.com>

Add Event Core + Occurrence Extension DwC-A exporter that produces a GBIF-compatible ZIP containing event.txt, occurrence.txt, meta.xml, and eml.xml. Events are the core entity with occurrences linked via eventID foreign key. Key design decisions: - Direct TSV writing with iterator(chunk_size=500) instead of DRF serializers - Taxonomy hierarchy extracted from parents_json to avoid N+1 queries - meta.xml generated from the same field definitions used for TSV columns - basisOfRecord = "MachineObservation" for all records - URN format IDs: urn:ami:event:{slug}:{id}, urn:ami:occurrence:{slug}:{id} Co-Authored-By: Claude <noreply@anthropic.com>

Test ZIP structure, event/occurrence headers and row counts, meta.xml core/extension structure, referential integrity between events and occurrences, taxonomy hierarchy extraction from parents_json, specific epithet parsing, and EML metadata validity. Co-Authored-By: Claude <noreply@anthropic.com>

Co-Authored-By: Claude <noreply@anthropic.com>

netlify · 2026-02-11T20:05:44Z

✅ Deploy Preview for antenna-preview canceled.

Name	Link
🔨 Latest commit	`c8aadb7`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-preview/deploys/698d322b1918830008eff4fe

netlify · 2026-02-11T20:05:44Z

✅ Deploy Preview for antenna-ssec canceled.

Name	Link
🔨 Latest commit	`c8aadb7`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-ssec/deploys/698d322b1bf45a0008d103f1

coderabbitai · 2026-02-11T20:05:48Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This pull request introduces a complete Darwin Core Archive (DwC-A) export format for AMI data. It adds a new exporter class, field mapping definitions, helper utilities for data transformation, metadata generation, registry integration, comprehensive test coverage, and technical documentation.

Changes

Cohort / File(s)	Summary
Planning & Framework Docs `.agents/planning/dwca-export-plan.md`, `docs/claude/export-framework.md`	High-level planning document for DwC-A feature and technical reference for the Export Framework architecture, registry pattern, and runtime flow.
DwC-A Core Implementation `ami/exports/dwca.py`	Implements DwC field mappings for Event and Occurrence, helper functions for date/coordinate/taxonomy formatting, TSV export logic, meta.xml and eml.xml generation, and ZIP archiving.
Exporter Integration `ami/exports/format_types.py`, `ami/exports/registry.py`	Adds DwCAExporter class with get_queryset, get_events_queryset, and export methods; registers "dwca" format in ExportRegistry (note: DwCAExporter appears duplicated in format_types.py).
Test Suite `ami/exports/tests.py`	Comprehensive DwCAExportTest covering exporter registration, ZIP contents validation, header correctness, row counts, referential integrity, taxonomy extraction, and EML metadata.
DwC-A Reference Documentation `docs/claude/dwca-format-reference.md`	Detailed DwC-A format specification including archive structure, meta.xml/EML schemas, DwC term mappings, and validation rules.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as DataExport API
    participant Exporter as DwCAExporter
    participant DB as Database
    participant Files as File System
    participant Storage as ZIP Archive

    Client->>API: POST export (format=dwca)
    API->>Exporter: initialize & export()
    Exporter->>DB: get_events_queryset()
    DB-->>Exporter: Event records
    Exporter->>Files: write_tsv(event.txt)
    Files-->>Exporter: event.txt path
    Exporter->>DB: get_queryset()
    DB-->>Exporter: Occurrence records
    Exporter->>Files: write_tsv(occurrence.txt)
    Files-->>Exporter: occurrence.txt path
    Exporter->>Exporter: generate_meta_xml()
    Exporter->>Exporter: generate_eml_xml()
    Exporter->>Storage: create_dwca_zip(event.txt, occurrence.txt, meta.xml, eml.xml)
    Storage-->>Exporter: dwca.zip path
    Exporter-->>API: return ZIP path
    API-->>Client: export complete with download URL

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A Darwin's dream in archives bright,
Events and occurrences packed just right,
Metadata nestled in XML's fold,
DwC-A stories ready to be told! 📦

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 79.31% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'feat: add Darwin Core Archive (DwC-A) export format' clearly and concisely summarizes the main change: introducing a new export format for DwC-A archives.
Description check	✅ Passed	PR description follows the template structure with Summary, Design decisions, Changes table, Test plan, Known issues, and Near/Eventual follow-ups. All required sections are present and substantive.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/dwca-export

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 12

🤖 Fix all issues with AI agents

In `@ami/exports/dwca.py`:
- Around line 132-142: The _format_duration function must guard against negative
durations: check event.start and event.end and compute delta = event.end -
event.start, then if delta.total_seconds() <= 0 return an empty string (or
optionally log a warning) instead of proceeding; update the logic around
total_seconds / divmod so negative values are not formatted (use the check on
delta.total_seconds() before computing hours/minutes and then format only for
positive durations).
- Around line 62-63: The taxonRank mapping lambda can raise AttributeError when
o.determination.rank is None; update the DWC + "taxonRank" tuple's lambda (the
function referencing o.determination.rank.lower()) to guard by checking both
o.determination and o.determination.rank before calling .lower(), returning an
empty string (or appropriate default) when rank is falsy.
- Line 87: The mapping that emits DWC + "individualCount" currently uses
getattr(o, "detections_count", 0) which is semantically wrong; change the tuple
entry for DWC + "individualCount" so it always emits the string "1" (i.e.,
replace the lambda that references detections_count with one that returns "1")
or remove the entire tuple if you prefer omission; update the tuple containing
DWC + "individualCount" and the lambda (the anonymous function referencing
getattr(o, "detections_count", 0)) accordingly.
- Around line 75-79: The ternary expression in the tuple entry with key
"vernacularName" is ambiguous; update the lambda (lambda o, slug:
o.determination.common_name_en or "" if o.determination else "") to add
parentheses around the ternary's true branch so it reads like
(o.determination.common_name_en or "") if o.determination else "" to make the
intent explicit and easier to read.
- Around line 309-312: The EML export is inserting the owner's email into the
<surName> element (project.owner.email) which leaks PII and is semantically
wrong; update the block that creates creator -> individualName -> surName to
avoid using project.owner.email: prefer using a non-PII display name (e.g.,
project.owner.get_display_name() or project.owner.name) if available, and if no
safe display name exists omit creating the individualName/surName elements
entirely (i.e., only create creator when a valid non-email name/contact is
present). Ensure references to project.owner, creator, individualName, and
surName are updated accordingly.
- Around line 288-290: Change the naive timestamp to a timezone-aware one:
replace the use of datetime.datetime.now() when assigning now (used alongside
project_slug/slugify) with Django's timezone-aware function (timezone.now()),
and add the appropriate import from django.utils import timezone at the top of
the module so now is timezone-aware when used in exports/dwca.py.

In `@ami/exports/format_types.py`:
- Around line 196-208: The Occurrence queryset in get_queryset currently
includes occurrences lacking related event or determination, producing empty
coreid/scientificName; update get_queryset (the Occurrence.objects.valid() chain
in the get_queryset method) to filter out occurrences with null event and null
determination (e.g., .filter(event__isnull=False, determination__isnull=False))
before the select_related/.with_identifications() calls so only occurrences with
both event and determination are exported.
- Around line 237-264: The temp files created for the DwC-A export
(event_file.name, occ_file.name and the produced zip_path) are never cleaned up;
wrap the export sequence in a try/finally around the code that calls
get_events_queryset, write_tsv, generate_meta_xml, generate_eml_xml and
create_dwca_zip so that in the finally block you remove event_file.name,
occ_file.name and (on error or success) delete any intermediate temp zip if
appropriate; ensure update_export_stats and the return of zip_path still occur
on success but that file removal happens only for the intermediate files (or
conditionally for the zip if you decide not to persist it).

In `@ami/exports/tests.py`:
- Around line 477-502: The test test_taxonomy_hierarchy_extraction can pass
vacuously when taxon.parents_json is empty; update the test to guarantee
parents_json is populated before calling _get_rank_from_parents: after obtaining
taxon (from occurrence.determination) and calling
taxon.save(update_calculated_fields=True) and taxon.refresh_from_db(), add an
assertion that taxon.parents_json is truthy (or else set up explicit test data/a
fixture so parents_json contains a parent chain) so the loop over ranks actually
executes and _get_rank_from_parents is exercised.

In `@docs/claude/dwca-format-reference.md`:
- Around line 94-113: The documentation shows EML 2.2.0 namespace
("https://eml.ecoinformatics.org/eml-2.2.0") but the implementation currently
emits EML 2.1.1 (literal "eml://ecoinformatics.org/eml-2.1.1"); either make the
docs reflect 2.1.1 or update the implementation to 2.2.0 so they match. To fix:
search the codebase for the literal "eml://ecoinformatics.org/eml-2.1.1" (and
any code that constructs the eml:eml root or schemaLocation) and change the
namespace and schemaLocation to the 2.2.0 values used in the doc, or
alternatively update the doc snippet to the 2.1.1 namespace and schemaLocation;
ensure the change is applied to the EML root construction logic (the
function/method that builds the eml:eml element) and the documentation fragment
so both use the same namespace string.

In `@docs/claude/export-framework.md`:
- Line 13: Update the documentation table entry that currently lists
JSONExporter and CSVExporter to also include the new DwCAExporter: edit the row
showing `ami/exports/format_types.py` so it mentions Concrete exporters
`JSONExporter`, `CSVExporter`, and `DwCAExporter` (use the exact class name
DwCAExporter to match the implementation).
- Around line 68-71: Update the example output of
ExportRegistry.get_supported_formats() to include the newly registered "dwca"
format so the snippet accurately reflects current registrations; adjust the
example list returned by ExportRegistry.get_supported_formats() (used alongside
ExportRegistry.register and ExportRegistry.get_exporter in the snippet) to
include "dwca" among the formats (e.g., ["occurrences_api_json",
"occurrences_simple_csv", "dwca"]).

🧹 Nitpick comments (3)

docs/claude/dwca-format-reference.md (1)

9-16: Add a language identifier to the fenced code block.

The linter flags this code block for missing a language specifier (MD040). Since it's a file tree diagram, use ```text.

ami/exports/tests.py (1)

329-341: Each test re-runs the full export pipeline independently.

Every test method calls _run_export(), which creates a DataExport, runs the entire export pipeline, and writes to storage. For 7 tests that call this helper, that's significant overhead. Consider using setUpClass or caching the export result in setUp to run the export once and share the ZIP across structural validation tests.

This is fine for correctness — just a test performance observation.
ami/exports/dwca.py (1)
286-286: Unused events_queryset parameter.

events_queryset is accepted but never used in generate_eml_xml. Either remove it or use it to populate temporal/geographic coverage metadata (the latter is listed as a follow-up). For now, removing the parameter or marking it for future use avoids confusion.
Proposed fix — remove unused parameter
-def generate_eml_xml(project, events_queryset=None) -> str:
+def generate_eml_xml(project) -> str:
This also requires updating the call site in ami/exports/format_types.py (line 258):
-        eml_xml = generate_eml_xml(self.project, events_qs)
+        eml_xml = generate_eml_xml(self.project)

ami/exports/dwca.py

coderabbitai · 2026-02-11T20:40:20Z

ami/exports/dwca.py

+    (
+        DWC + "vernacularName",
+        "vernacularName",
+        lambda o, slug: o.determination.common_name_en or "" if o.determination else "",
+    ),


🛠️ Refactor suggestion | 🟠 Major

vernacularName lambda: add parentheses for clarity (known issue #4).

While Python's operator precedence makes this technically correct (or binds tighter than the ternary), the expression is confusing to read and a common source of bugs. Adding explicit parentheses removes ambiguity.

Proposed fix

( DWC + "vernacularName", "vernacularName", - lambda o, slug: o.determination.common_name_en or "" if o.determination else "", + lambda o, slug: (o.determination.common_name_en or "") if o.determination else "", ),

🧰 Tools

🪛 Ruff (0.15.0)

[warning] 78-78: Unused lambda argument: slug

(ARG005)

🤖 Prompt for AI Agents

In `@ami/exports/dwca.py` around lines 75 - 79, The ternary expression in the tuple entry with key "vernacularName" is ambiguous; update the lambda (lambda o, slug: o.determination.common_name_en or "" if o.determination else "") to add parentheses around the ternary's true branch so it reads like (o.determination.common_name_en or "") if o.determination else "" to make the intent explicit and easier to read.

ami/exports/dwca.py

coderabbitai · 2026-02-11T20:40:21Z

ami/exports/dwca.py

+def _format_duration(event) -> str:
+    """Format event duration as human-readable string."""
+    if not event.start or not event.end:
+        return ""
+    delta = event.end - event.start
+    total_seconds = int(delta.total_seconds())
+    hours, remainder = divmod(total_seconds, 3600)
+    minutes, _ = divmod(remainder, 60)
+    if hours > 0:
+        return f"{hours}h {minutes}m"
+    return f"{minutes}m"


⚠️ Potential issue | 🟡 Minor

Negative durations are not guarded against.

If event.end < event.start (data inconsistency), total_seconds will be negative, and divmod with a negative dividend produces negative results, yielding strings like "-1h -30m". Consider returning an empty string or logging a warning for negative durations.

Proposed fix

def _format_duration(event) -> str: """Format event duration as human-readable string.""" if not event.start or not event.end: return "" delta = event.end - event.start total_seconds = int(delta.total_seconds()) + if total_seconds < 0: + return "" hours, remainder = divmod(total_seconds, 3600) minutes, _ = divmod(remainder, 60)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _format_duration(event) -> str:

"""Format event duration as human-readable string."""

if not event.start or not event.end:

return ""

delta = event.end - event.start

total_seconds = int(delta.total_seconds())

hours, remainder = divmod(total_seconds, 3600)

minutes, _ = divmod(remainder, 60)

if hours > 0:

return f"{hours}h {minutes}m"

return f"{minutes}m"

def _format_duration(event) -> str:

"""Format event duration as human-readable string."""

if not event.start or not event.end:

return ""

delta = event.end - event.start

total_seconds = int(delta.total_seconds())

if total_seconds < 0:

return ""

hours, remainder = divmod(total_seconds, 3600)

minutes, _ = divmod(remainder, 60)

if hours > 0:

return f"{hours}h {minutes}m"

return f"{minutes}m"

🤖 Prompt for AI Agents

In `@ami/exports/dwca.py` around lines 132 - 142, The _format_duration function must guard against negative durations: check event.start and event.end and compute delta = event.end - event.start, then if delta.total_seconds() <= 0 return an empty string (or optionally log a warning) instead of proceeding; update the logic around total_seconds / divmod so negative values are not formatted (use the check on delta.total_seconds() before computing hours/minutes and then format only for positive durations).

coderabbitai · 2026-02-11T20:40:21Z

ami/exports/dwca.py

+
+    project_slug = slugify(project.name)
+    now = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S")


⚠️ Potential issue | 🟡 Minor

Use timezone-aware datetime.

datetime.datetime.now() produces a naive datetime. In a Django project with USE_TZ=True, this is inconsistent and could produce incorrect timestamps. Use Django's timezone.now().

Proposed fix

+ from django.utils import timezone + project_slug = slugify(project.name) - now = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S") + now = timezone.now().strftime("%Y-%m-%dT%H:%M:%S")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

project_slug = slugify(project.name)

now = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

from django.utils import timezone

project_slug = slugify(project.name)

now = timezone.now().strftime("%Y-%m-%dT%H:%M:%S")

🤖 Prompt for AI Agents

In `@ami/exports/dwca.py` around lines 288 - 290, Change the naive timestamp to a timezone-aware one: replace the use of datetime.datetime.now() when assigning now (used alongside project_slug/slugify) with Django's timezone-aware function (timezone.now()), and add the appropriate import from django.utils import timezone at the top of the module so now is timezone-aware when used in exports/dwca.py.

ami/exports/format_types.py

ami/exports/tests.py

coderabbitai · 2026-02-11T20:40:21Z

docs/claude/dwca-format-reference.md

+```xml
+<?xml version="1.0" encoding="UTF-8"?>
+<eml:eml xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0"
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 https://eml.ecoinformatics.org/eml-2.2.0/eml.xsd"
+         packageId="urn:ami:dataset:{project_id}" system="AMI">
+  <dataset>
+    <title>{project.name}</title>
+    <creator>
+      <organizationName>{project.owner or institution}</organizationName>
+    </creator>
+    <abstract>
+      <para>{project.description}</para>
+    </abstract>
+    <intellectualRights>
+      <para>License information here</para>
+    </intellectualRights>
+  </dataset>
+</eml:eml>
+```


⚠️ Potential issue | 🟡 Minor

EML version mismatch between documentation and implementation.

This reference doc shows EML 2.2.0 namespace (https://eml.ecoinformatics.org/eml-2.2.0), but the actual implementation in ami/exports/dwca.py (line 293) uses EML 2.1.1 (eml://ecoinformatics.org/eml-2.1.1). The planning doc acknowledges upgrading to 2.2.0 as a follow-up, but this reference should either match the current implementation or clearly note the discrepancy.

🤖 Prompt for AI Agents

In `@docs/claude/dwca-format-reference.md` around lines 94 - 113, The documentation shows EML 2.2.0 namespace ("https://eml.ecoinformatics.org/eml-2.2.0") but the implementation currently emits EML 2.1.1 (literal "eml://ecoinformatics.org/eml-2.1.1"); either make the docs reflect 2.1.1 or update the implementation to 2.2.0 so they match. To fix: search the codebase for the literal "eml://ecoinformatics.org/eml-2.1.1" (and any code that constructs the eml:eml root or schemaLocation) and change the namespace and schemaLocation to the 2.2.0 values used in the doc, or alternatively update the doc snippet to the 2.1.1 namespace and schemaLocation; ensure the change is applied to the EML root construction logic (the function/method that builds the eml:eml element) and the documentation fragment so both use the same namespace string.

docs/claude/export-framework.md

coderabbitai · 2026-02-11T20:40:21Z

docs/claude/export-framework.md

+ExportRegistry.register("format_name")(ExporterClass)
+ExportRegistry.get_exporter("format_name")  # → ExporterClass
+ExportRegistry.get_supported_formats()       # → ["occurrences_api_json", "occurrences_simple_csv"]
+```


⚠️ Potential issue | 🟡 Minor

Example output of get_supported_formats() is incomplete.

The example on line 70 should include "dwca" to reflect the new registration.

Proposed fix

-ExportRegistry.get_supported_formats() # → ["occurrences_api_json", "occurrences_simple_csv"] +ExportRegistry.get_supported_formats() # → ["occurrences_api_json", "occurrences_simple_csv", "dwca"]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

ExportRegistry.register("format_name")(ExporterClass)

ExportRegistry.get_exporter("format_name") # → ExporterClass

ExportRegistry.get_supported_formats() # → ["occurrences_api_json", "occurrences_simple_csv"]

```

ExportRegistry.register("format_name")(ExporterClass)

ExportRegistry.get_exporter("format_name") # → ExporterClass

ExportRegistry.get_supported_formats() # → ["occurrences_api_json", "occurrences_simple_csv", "dwca"]

🤖 Prompt for AI Agents

In `@docs/claude/export-framework.md` around lines 68 - 71, Update the example output of ExportRegistry.get_supported_formats() to include the newly registered "dwca" format so the snippet accurately reflects current registrations; adjust the example list returned by ExportRegistry.get_supported_formats() (used alongside ExportRegistry.register and ExportRegistry.get_exporter in the snippet) to include "dwca" among the formats (e.g., ["occurrences_api_json", "occurrences_simple_csv", "dwca"]).

Copilot

Pull request overview

Adds a new export format (dwca) to generate a GBIF-compatible Darwin Core Archive (DwC-A) ZIP using an Event core (event.txt) with an Occurrence extension (occurrence.txt), along with generated meta.xml and eml.xml.

Changes:

Introduces ami/exports/dwca.py with DwC field mappings, TSV writing, and meta/eml XML generation.
Adds DwCAExporter to the existing export framework and registers it under "dwca".
Adds a new DwC-A-focused test suite validating ZIP contents and basic structure.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
`ami/exports/dwca.py`	New DwC-A field definitions and archive/metadata generators.
`ami/exports/format_types.py`	Adds `DwCAExporter` implementation to orchestrate TSV generation + ZIP packaging.
`ami/exports/registry.py`	Registers the new `"dwca"` export format.
`ami/exports/tests.py`	Adds end-to-end tests covering ZIP structure, TSV headers/rows, and XML parsing.
`docs/claude/export-framework.md`	New internal technical reference for the export framework.
`docs/claude/dwca-format-reference.md`	New DwC-A format reference documentation.
`.agents/planning/dwca-export-plan.md`	Planning/design notes and known issues list for DwC-A export.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T20:40:45Z

docs/claude/export-framework.md

+|------|---------|
+| `ami/exports/base.py` | `BaseExporter` ABC - all exporters inherit from this |
+| `ami/exports/registry.py` | `ExportRegistry` - maps format strings to exporter classes |
+| `ami/exports/format_types.py` | Concrete exporters: `JSONExporter`, `CSVExporter` |


The “Concrete exporters” bullet lists only JSONExporter and CSVExporter, but this PR adds DwCAExporter in ami/exports/format_types.py. Update this reference so the technical doc matches the supported exporters.

Suggested change

| `ami/exports/format_types.py` | Concrete exporters: `JSONExporter`, `CSVExporter` |

| `ami/exports/format_types.py` | Concrete exporters: `JSONExporter`, `CSVExporter`, `DwCAExporter` |

docs/claude/dwca-format-reference.md

Copilot · 2026-02-11T20:40:46Z

ami/exports/format_types.py

+    def get_queryset(self):
+        """Return the occurrence queryset (used by BaseExporter for record count)."""
+        return (
+            Occurrence.objects.valid()  # type: ignore[union-attr]
+            .filter(project=self.project)
+            .select_related(
+                "determination",
+                "event",
+                "deployment",
+            )
+            .with_detections_count()
+            .with_identifications()
+        )


DwC-A occurrence exports can currently include rows with event_id or determination missing, which produces empty coreid/eventID and scientificName values in occurrence.txt (GBIF validator rejects these). Filter the queryset to exclude occurrences with null event and null determination (and adjust the expected counts in tests accordingly).

ami/exports/format_types.py

Copilot · 2026-02-11T20:40:46Z

ami/exports/format_types.py

+            progress_callback=lambda n: self.update_job_progress(n),
+        )
+        logger.info(f"DwC-A: wrote {occ_count} occurrences")
+


Job progress for DwC-A exports is only updated every 500 occurrences; for smaller exports progress may remain at 0% even though the job completes. Consider calling update_job_progress(occ_count) once at the end (guarding against total_records == 0 to avoid division by zero in BaseExporter.update_job_progress()).

Suggested change

# Ensure final progress update, even if the callback was never invoked.

# Guard against total_records == 0 to avoid division by zero in BaseExporter.update_job_progress().

if getattr(self, "total_records", 0):

self.update_job_progress(occ_count)

ami/exports/tests.py

ami/exports/format_types.py

Copilot · 2026-02-11T20:40:48Z

ami/exports/dwca.py

+    # Column 0 is the id (eventID)
+    id_elem = ET.SubElement(core, "id")
+    id_elem.set("index", "0")
+
+    for i, (term_uri, header, _) in enumerate(event_fields):
+        if i == 0:
+            continue  # Already declared as <id>
+        field = ET.SubElement(core, "field")
+        field.set("index", str(i))
+        field.set("term", term_uri)
+


In meta.xml generation, column 0 is declared as <id index="0"/> but the corresponding <field index="0" term="...eventID"/> mapping is skipped. Many DwC-A consumers/validators expect the ID column to also be mapped to the Darwin Core term (as in the DwC Text Guide examples). Include a <field> mapping for index 0 instead of skipping it.

ami/exports/dwca.py

ami/exports/format_types.py

- Guard taxonRank against None rank (AttributeError on .lower()) - Guard _get_rank_from_parents against None det_rank - Fix vernacularName ternary precedence with explicit parentheses - Change individualCount to emit "1" (each occurrence = 1 individual, not detections_count which counts bounding boxes) - Guard _format_duration against negative durations Co-Authored-By: Claude <noreply@anthropic.com>

- Filter out occurrences with null event or determination from DwC-A export queryset (GBIF rejects empty coreid/scientificName) - Replace project.owner.email with project.owner.name in EML creator element to avoid leaking PII in downloadable archives - Only emit individualName when owner has a name set Co-Authored-By: Claude <noreply@anthropic.com>

- Wrap DwC-A export in try/finally to clean up intermediate temp files - Use timezone.now() instead of naive datetime.datetime.now() - Use full EML schemaLocation URL for GBIF validation compatibility - Remove unused events_queryset parameter from generate_eml_xml - Simplify progress_callback lambda to direct method reference Co-Authored-By: Claude <noreply@anthropic.com>

- Update occurrence row count test to match null-filtered queryset - Assert parents_json is populated in taxonomy hierarchy test - Use settings.MEDIA_URL instead of hardcoded "/media/" in test helper - Add DwCAExporter to export-framework.md file table and registry example - Fix EML version in dwca-format-reference.md to match implementation (2.1.1) Co-Authored-By: Claude <noreply@anthropic.com>

…pdate - Map all columns (including index 0) to DwC term URIs in meta.xml so GBIF validators can resolve both <id>/<coreid> and <field> entries - Change fieldsEnclosedBy from empty string to double-quote character for stricter parser compatibility - Add final progress update after TSV writing so small exports (<500 records) report completion instead of staying at 0% Co-Authored-By: Claude <noreply@anthropic.com>

Run the export pipeline once in setUpClass and share the ZIP across all structural validation tests instead of re-running per test method. This reduces test time from ~7 export runs to 1. Co-Authored-By: Claude <noreply@anthropic.com>

…occurrences in DwC-A Remove get_filter_backends() override that returned [], allowing DwCAExporter to inherit BaseExporter's OccurrenceCollectionFilter. Update get_events_queryset() to derive events from self.queryset instead of fetching all project events, preventing orphaned events when collection_id filtering is active. Co-Authored-By: Claude <noreply@anthropic.com>

…ssertion Add test_dwca_export_with_collection_filter that verifies filtered exports produce correct occurrence/event counts and referential integrity. Update test_event_headers_and_row_count to expect events derived from occurrences rather than all project events. Co-Authored-By: Claude <noreply@anthropic.com>

Co-Authored-By: Claude <noreply@anthropic.com>

Add API methods, request/response format, filter system, DwC-A specifics, job integration, and file lifecycle details. Remove separate export-system.md. Co-Authored-By: Claude <noreply@anthropic.com>

mihow · 2026-04-01T06:01:25Z

@copilot add a more accessible overview of all the field mappings and decisions from Antenna -> Darwin Core & GBIF in the PR description so this can be more easily reviewed & tested. Add how to test instructions.

Copilot · 2026-04-01T06:05:58Z

@copilot add a more accessible overview of all the field mappings and decisions from Antenna -> Darwin Core & GBIF in the PR description so this can be more easily reviewed & tested. Add how to test i...

Updated the PR description with:

Field mapping tables for both event.txt (16 columns) and occurrence.txt (18 columns) showing the Antenna model source, DwC term, and any relevant notes for each field
Design decisions table — explains why Event Core was chosen, why individualCount=1, how taxonomy is extracted, why events are derived from the filtered occurrence queryset, etc.
Step-by-step testing guide covering: API export creation, polling for completion, ZIP inspection, GBIF Data Validator upload, shell-based referential integrity check, collection filter test, and the automated test commands with expected test names

mihow and others added 4 commits February 11, 2026 11:45

docs: add feature context and roadmap to DwC-A export plan

928d9fc

Co-Authored-By: Claude <noreply@anthropic.com>

docs: add review findings and follow-up roadmap to DwC-A plan

dd2309e

Co-Authored-By: Claude <noreply@anthropic.com>

mihow marked this pull request as ready for review February 11, 2026 20:34

Copilot AI review requested due to automatic review settings February 11, 2026 20:34

Copilot started reviewing on behalf of mihow February 11, 2026 20:34 View session

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

Copilot AI reviewed Feb 11, 2026

View reviewed changes

mihow and others added 2 commits February 11, 2026 13:33

mihow marked this pull request as draft February 11, 2026 21:51

mihow and others added 8 commits February 11, 2026 15:26

docs(exports): add API and operations reference for export system

e14139d

Co-Authored-By: Claude <noreply@anthropic.com>

docs(exports): merge API reference into export-framework.md

c8aadb7

Add API methods, request/response format, filter system, DwC-A specifics, job integration, and file lifecycle details. Remove separate export-system.md. Co-Authored-By: Claude <noreply@anthropic.com>

mihow added the next up! label Mar 26, 2026

Copilot started work on behalf of mihow April 1, 2026 06:01 View session

Copilot finished work on behalf of mihow April 1, 2026 06:10


		project_slug = slugify(project.name)
		now = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

	\| `ami/exports/format_types.py` \| Concrete exporters: `JSONExporter`, `CSVExporter` \|
	\| `ami/exports/format_types.py` \| Concrete exporters: `JSONExporter`, `CSVExporter`, `DwCAExporter` \|

+        # Ensure final progress update, even if the callback was never invoked.
+        # Guard against total_records == 0 to avoid division by zero in BaseExporter.update_job_progress().
+        if getattr(self, "total_records", 0):
+            self.update_job_progress(occ_count)

Conversation

mihow commented Feb 11, 2026 • edited by Copilot AI Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

List of Changes

Related Issues

Detailed Description

Archive structure

Field mappings: Event Core (event.txt)

Field mappings: Occurrence Extension (occurrence.txt)

Key design decisions

Near follow-up (before real GBIF submission)

Eventual follow-up

How to Test the Changes

Prerequisites

1 · Create an export via the API

2 · Inspect the ZIP

3 · Validate with GBIF Data Validator

4 · Spot-check referential integrity

5 · Test with collection filter

6 · Run automated tests

Screenshots

Deployment Notes

Checklist

Uh oh!

netlify bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-preview canceled.

Uh oh!

netlify bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-ssec canceled.

Uh oh!

coderabbitai bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

mihow commented Feb 11, 2026 •

edited by Copilot AI

Loading

Field mappings: Event Core (`event.txt`)

Field mappings: Occurrence Extension (`occurrence.txt`)

netlify bot commented Feb 11, 2026 •

edited

Loading

netlify bot commented Feb 11, 2026 •

edited

Loading

coderabbitai bot commented Feb 11, 2026 •

edited

Loading