Management command for taxonomy export/import (cross-environment sync)

## Problem

When syncing taxonomy between environments (production → demo, or between instances), there's no built-in way to export/import the manually-curated taxonomy data. The current process requires ad-hoc Django shell commands and direct SQL.

The CSV bulk import (`import_taxa`) handles genera and species from Google Sheets, but doesn't cover:
- **Special classification labels** created by ML models: `Not Identifiable`, `Not Lepidoptera`, `Not Arthropoda`
- **Common name search terms** (`search_names`) added manually to orders and key species (e.g. `Diptera` → `['Flies and Mosquitoes', 'fly']`)
- **Display names** and other metadata adjustments

## Proposal

Two management commands: `export_taxa` and `import_taxa_json` (to avoid collision with existing `import_taxa`).

### `export_taxa`

Exports taxa with non-default data (search_names, display_name overrides, special ranks) to JSON.

```bash
# Export all taxa with search_names set
python manage.py export_taxa --output taxa_sync.json

# Export specific taxa lists or ranks
python manage.py export_taxa --ranks ORDER,PHYLUM,UNKNOWN --output orders.json
python manage.py export_taxa --has-search-names --output searchable.json
```

**Output format:**
```json
{
  "version": 1,
  "exported_at": "2026-03-24T12:00:00Z",
  "source": "antenna.insectai.org",
  "taxa": [
    {
      "name": "Lepidoptera",
      "rank": "ORDER",
      "display_name": "Lepidoptera",
      "search_names": ["Butterflies and Moths", "moth"],
      "common_name_en": null,
      "parent_name": "Insecta",
      "parent_rank": "CLASS",
      "active": true
    },
    {
      "name": "Not Identifiable",
      "rank": "UNKNOWN",
      "display_name": "Not Identifiable",
      "search_names": [],
      "parent_name": null,
      "parent_rank": null,
      "active": true
    }
  ]
}
```

Key: identify taxa by `(name, rank)` pair, parent by `(parent_name, parent_rank)`. Don't export PKs — they differ between environments.

### `import_taxa_json`

Upserts from the export JSON. Match by `(name, rank)`, create if missing, update fields if changed.

```bash
# Preview what would change (dry run)
python manage.py import_taxa_json taxa_sync.json --dry-run

# Apply
python manage.py import_taxa_json taxa_sync.json

# Only update search_names, don't create new taxa
python manage.py import_taxa_json taxa_sync.json --update-only --fields search_names
```

**Dry run output:**
```
Would CREATE: Not Arthropoda (PHYLUM) - search_names: ['Not an invertebrates...']
Would UPDATE: Diptera (ORDER) - search_names: ['Flies and Mosquitoes'] → ['Flies and Mosquitoes', 'fly']
Would SKIP: Lepidoptera (ORDER) - no changes
3 taxa in file, 1 create, 1 update, 1 skip
```

## Implementation notes

### Matching by `(name, rank)` — the Plecoptera problem

There's a real case where `name` is ambiguous: `Plecoptera` exists as both a moth GENUS and an insect ORDER. Production disambiguates with `Plecoptera (Order)` for the order. The export/import should use `(name, rank)` as the composite key, which handles this naturally.

For duplicate names within the same rank (shouldn't happen but could with data bugs), the import should warn and skip rather than silently pick one.

### Fields to sync

Export these fields (skip if they match defaults/empty):
- `search_names` (ArrayField) — the primary use case
- `display_name` — usually matches `name` but can be overridden
- `common_name_en` — English common name
- `active` — soft-delete flag
- `parent` — by name+rank reference

Don't export: `pk`, `created_at`, `updated_at`, `parents_json` (derived), `ordering`, `sort_phylogeny`, `gbif_taxon_key`, `inat_taxon_id` (external IDs are environment-specific).

### Gotchas from manual sync experience

1. **`search_names` is a Postgres ArrayField** — empty is `[]` not `None`. Some taxa have `None`, some have `[]`. Normalize on export.

2. **ML model labels create duplicate taxa** — models output labels like `moth`, `nonmoth` which auto-create taxa if they don't match existing names. The proper fix is `merge_taxa` (PR branch `feat/merge-taxa-command`), but the export/import should handle the state where duplicates exist (export both, let admin decide which to keep).

3. **`add_genus_parents` can misparent** — if an order shares a name with a genus (Plecoptera), species get parented to the wrong taxon. The import shouldn't touch parent relationships for species — those come from `add_genus_parents`.

4. **DB connection drops on external Postgres** — long-running operations against an external DB (like demo's setup) can timeout. The import should commit in batches rather than one big transaction.

5. **`Not Identifiable` rank is `UNKNOWN`** — not a standard taxonomic rank but used by the ML pipeline. The export/import should preserve non-standard ranks.

## Alternatives considered

- **pg_dump/pg_restore of just the taxa table** — too blunt, includes PKs and foreign key issues
- **Django fixtures (dumpdata/loaddata)** — includes PKs, doesn't handle upsert
- **Extending `import_taxa` CSV format** — CSV doesn't handle array fields (search_names) cleanly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Management command for taxonomy export/import (cross-environment sync) #1187

Problem

Proposal

`export_taxa`

`import_taxa_json`

Implementation notes

Matching by `(name, rank)` — the Plecoptera problem

Fields to sync

Gotchas from manual sync experience

Alternatives considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Management command for taxonomy export/import (cross-environment sync) #1187

Description

Problem

Proposal

export_taxa

import_taxa_json

Implementation notes

Matching by (name, rank) — the Plecoptera problem

Fields to sync

Gotchas from manual sync experience

Alternatives considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`export_taxa`

`import_taxa_json`

Matching by `(name, rank)` — the Plecoptera problem