feat: add Darwin Core Archive (DwC-A) export format by mihow · Pull Request #1131 · RolnickLab/antenna

mihow · 2026-02-11T20:05:39Z

Summary

Add a new dwca export format that produces a GBIF-compatible Darwin Core Archive (ZIP containing event.txt, occurrence.txt, meta.xml, eml.xml)
Uses Event Core architecture: events are the core entity, occurrences are an extension linked via eventID
Field mappings, metadata generators, and taxonomy helpers live in a new ami/exports/dwca.py module
Registered via the existing export registry — available immediately through the exports API

This is the foundation for publishing AMI biodiversity data to GBIF. The initial implementation produces static archive downloads. Future work will add richer metadata, multimedia extensions, and eventually automated publishing via IPT integration.

Design decisions

Direct TSV writing instead of DRF serializers — DwC fields are flat extractions, not nested API representations. Uses queryset.iterator(chunk_size=500) for memory efficiency.
Taxonomy from parents_json — Walks the pre-computed parent list to extract kingdom/phylum/class/order/family/genus, avoiding N+1 queries.
meta.xml generated from field definitions — The same field list used for TSV columns drives meta.xml generation, keeping them in sync.
basisOfRecord = "MachineObservation" for all records.
URN format IDs: urn:ami:event:{slug}:{id}, urn:ami:occurrence:{slug}:{id}.

Changes

File	Change
`ami/exports/dwca.py`	New — DwC field mappings, TSV writer, meta.xml/eml.xml generators, taxonomy helpers
`ami/exports/format_types.py`	Add `DwCAExporter(BaseExporter)` class
`ami/exports/registry.py`	Register `"dwca"` format
`ami/exports/tests.py`	9 tests: ZIP structure, headers, row counts, meta.xml structure, referential integrity, taxonomy extraction, EML validity

Test plan

All 20 export tests pass (11 existing + 9 new)
Manual test: create a DwC-A export via the API, download ZIP, inspect contents
Validate with GBIF Data Validator: https://www.gbif.org/tools/data-validator

Known issues to fix before merge

Occurrences without events produce empty coreid — GBIF will reject orphaned extension rows. Need .filter(event__isnull=False) on the occurrence queryset.
Occurrences without determinations produce empty scientificName — GBIF treats this as required. Need .filter(determination__isnull=False).
individualCount semantics are wrong — detections_count is the number of bounding boxes across frames, not individuals. Each occurrence is one individual in AMI's model. Should emit 1 or omit the field.
vernacularName lambda has ambiguous operator precedence — Works by accident but should be parenthesized.
Temp files never cleaned up — event.txt, occurrence.txt, and the zip temp file leak on the worker.

Near follow-up (before real GBIF submission)

Apply project default filters to the occurrence queryset — without this, low-confidence ML determinations get published. Biggest data quality risk.
Add license field on events — GBIF requires a dataset license for reuse terms.
Add identifiedBy / dateIdentified — provenance for who/what made the determination.
Add associatedMedia — detection image URLs. For an image-based platform this is the primary evidence.
Runtime validation before packaging — check for missing required fields, orphaned references.
Multimedia extension file (multimedia.txt)
Species / taxa extension file
Upgrade EML to 2.2.0 — current code uses 2.1.1, GBIF recommends 2.2.0.

Eventual follow-up

EML geographic/temporal coverage computed from actual data (bounding box, date range)
country, stateProvince, locality on events (requires reverse geocoding or Site model fields)
coordinateUncertaintyInMeters
institutionCode, collectionCode (project-level settings)
scientificNameAuthorship from Taxon.author
eventType field
GBIF Data Validator automated integration test
IPT server integration / acting as IPT endpoint for GBIF crawling

Summary by CodeRabbit

New Features
- Added Darwin Core Archive (DwC-A) export format for standardized biodiversity data sharing.
- Exports now package event and occurrence records with complete metadata files into a single ZIP archive.
Documentation
- Added DwC-A format reference and export framework technical documentation.

Add planning document for Darwin Core Archive export format, export framework technical reference, DwC-A format reference with field mappings, and downloaded DwC terms quick reference from TDWG. Co-Authored-By: Claude <noreply@anthropic.com>

Add Event Core + Occurrence Extension DwC-A exporter that produces a GBIF-compatible ZIP containing event.txt, occurrence.txt, meta.xml, and eml.xml. Events are the core entity with occurrences linked via eventID foreign key. Key design decisions: - Direct TSV writing with iterator(chunk_size=500) instead of DRF serializers - Taxonomy hierarchy extracted from parents_json to avoid N+1 queries - meta.xml generated from the same field definitions used for TSV columns - basisOfRecord = "MachineObservation" for all records - URN format IDs: urn:ami:event:{slug}:{id}, urn:ami:occurrence:{slug}:{id} Co-Authored-By: Claude <noreply@anthropic.com>

Test ZIP structure, event/occurrence headers and row counts, meta.xml core/extension structure, referential integrity between events and occurrences, taxonomy hierarchy extraction from parents_json, specific epithet parsing, and EML metadata validity. Co-Authored-By: Claude <noreply@anthropic.com>

Co-Authored-By: Claude <noreply@anthropic.com>

netlify · 2026-02-11T20:05:44Z

✅ Deploy Preview for antenna-preview canceled.

Name	Link
🔨 Latest commit	`c8aadb7`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-preview/deploys/698d322b1918830008eff4fe

netlify · 2026-02-11T20:05:44Z

✅ Deploy Preview for antenna-ssec canceled.

Name	Link
🔨 Latest commit	`c8aadb7`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-ssec/deploys/698d322b1bf45a0008d103f1

coderabbitai · 2026-02-11T20:05:48Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This pull request introduces a complete Darwin Core Archive (DwC-A) export format for AMI data. It adds a new exporter class, field mapping definitions, helper utilities for data transformation, metadata generation, registry integration, comprehensive test coverage, and technical documentation.

Changes

Cohort / File(s)	Summary
Planning & Framework Docs `.agents/planning/dwca-export-plan.md`, `docs/claude/export-framework.md`	High-level planning document for DwC-A feature and technical reference for the Export Framework architecture, registry pattern, and runtime flow.
DwC-A Core Implementation `ami/exports/dwca.py`	Implements DwC field mappings for Event and Occurrence, helper functions for date/coordinate/taxonomy formatting, TSV export logic, meta.xml and eml.xml generation, and ZIP archiving.
Exporter Integration `ami/exports/format_types.py`, `ami/exports/registry.py`	Adds DwCAExporter class with get_queryset, get_events_queryset, and export methods; registers "dwca" format in ExportRegistry (note: DwCAExporter appears duplicated in format_types.py).
Test Suite `ami/exports/tests.py`	Comprehensive DwCAExportTest covering exporter registration, ZIP contents validation, header correctness, row counts, referential integrity, taxonomy extraction, and EML metadata.
DwC-A Reference Documentation `docs/claude/dwca-format-reference.md`	Detailed DwC-A format specification including archive structure, meta.xml/EML schemas, DwC term mappings, and validation rules.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as DataExport API
    participant Exporter as DwCAExporter
    participant DB as Database
    participant Files as File System
    participant Storage as ZIP Archive

    Client->>API: POST export (format=dwca)
    API->>Exporter: initialize & export()
    Exporter->>DB: get_events_queryset()
    DB-->>Exporter: Event records
    Exporter->>Files: write_tsv(event.txt)
    Files-->>Exporter: event.txt path
    Exporter->>DB: get_queryset()
    DB-->>Exporter: Occurrence records
    Exporter->>Files: write_tsv(occurrence.txt)
    Files-->>Exporter: occurrence.txt path
    Exporter->>Exporter: generate_meta_xml()
    Exporter->>Exporter: generate_eml_xml()
    Exporter->>Storage: create_dwca_zip(event.txt, occurrence.txt, meta.xml, eml.xml)
    Storage-->>Exporter: dwca.zip path
    Exporter-->>API: return ZIP path
    API-->>Client: export complete with download URL

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A Darwin's dream in archives bright,
Events and occurrences packed just right,
Metadata nestled in XML's fold,
DwC-A stories ready to be told! 📦

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 79.31% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'feat: add Darwin Core Archive (DwC-A) export format' clearly and concisely summarizes the main change: introducing a new export format for DwC-A archives.
Description check	✅ Passed	PR description follows the template structure with Summary, Design decisions, Changes table, Test plan, Known issues, and Near/Eventual follow-ups. All required sections are present and substantive.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/dwca-export

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 12

🤖 Fix all issues with AI agents

In `@ami/exports/dwca.py`:
- Around line 132-142: The _format_duration function must guard against negative
durations: check event.start and event.end and compute delta = event.end -
event.start, then if delta.total_seconds() <= 0 return an empty string (or
optionally log a warning) instead of proceeding; update the logic around
total_seconds / divmod so negative values are not formatted (use the check on
delta.total_seconds() before computing hours/minutes and then format only for
positive durations).
- Around line 62-63: The taxonRank mapping lambda can raise AttributeError when
o.determination.rank is None; update the DWC + "taxonRank" tuple's lambda (the
function referencing o.determination.rank.lower()) to guard by checking both
o.determination and o.determination.rank before calling .lower(), returning an
empty string (or appropriate default) when rank is falsy.
- Line 87: The mapping that emits DWC + "individualCount" currently uses
getattr(o, "detections_count", 0) which is semantically wrong; change the tuple
entry for DWC + "individualCount" so it always emits the string "1" (i.e.,
replace the lambda that references detections_count with one that returns "1")
or remove the entire tuple if you prefer omission; update the tuple containing
DWC + "individualCount" and the lambda (the anonymous function referencing
getattr(o, "detections_count", 0)) accordingly.
- Around line 75-79: The ternary expression in the tuple entry with key
"vernacularName" is ambiguous; update the lambda (lambda o, slug:
o.determination.common_name_en or "" if o.determination else "") to add
parentheses around the ternary's true branch so it reads like
(o.determination.common_name_en or "") if o.determination else "" to make the
intent explicit and easier to read.
- Around line 309-312: The EML export is inserting the owner's email into the
<surName> element (project.owner.email) which leaks PII and is semantically
wrong; update the block that creates creator -> individualName -> surName to
avoid using project.owner.email: prefer using a non-PII display name (e.g.,
project.owner.get_display_name() or project.owner.name) if available, and if no
safe display name exists omit creating the individualName/surName elements
entirely (i.e., only create creator when a valid non-email name/contact is
present). Ensure references to project.owner, creator, individualName, and
surName are updated accordingly.
- Around line 288-290: Change the naive timestamp to a timezone-aware one:
replace the use of datetime.datetime.now() when assigning now (used alongside
project_slug/slugify) with Django's timezone-aware function (timezone.now()),
and add the appropriate import from django.utils import timezone at the top of
the module so now is timezone-aware when used in exports/dwca.py.

In `@ami/exports/format_types.py`:
- Around line 196-208: The Occurrence queryset in get_queryset currently
includes occurrences lacking related event or determination, producing empty
coreid/scientificName; update get_queryset (the Occurrence.objects.valid() chain
in the get_queryset method) to filter out occurrences with null event and null
determination (e.g., .filter(event__isnull=False, determination__isnull=False))
before the select_related/.with_identifications() calls so only occurrences with
both event and determination are exported.
- Around line 237-264: The temp files created for the DwC-A export
(event_file.name, occ_file.name and the produced zip_path) are never cleaned up;
wrap the export sequence in a try/finally around the code that calls
get_events_queryset, write_tsv, generate_meta_xml, generate_eml_xml and
create_dwca_zip so that in the finally block you remove event_file.name,
occ_file.name and (on error or success) delete any intermediate temp zip if
appropriate; ensure update_export_stats and the return of zip_path still occur
on success but that file removal happens only for the intermediate files (or
conditionally for the zip if you decide not to persist it).

In `@ami/exports/tests.py`:
- Around line 477-502: The test test_taxonomy_hierarchy_extraction can pass
vacuously when taxon.parents_json is empty; update the test to guarantee
parents_json is populated before calling _get_rank_from_parents: after obtaining
taxon (from occurrence.determination) and calling
taxon.save(update_calculated_fields=True) and taxon.refresh_from_db(), add an
assertion that taxon.parents_json is truthy (or else set up explicit test data/a
fixture so parents_json contains a parent chain) so the loop over ranks actually
executes and _get_rank_from_parents is exercised.

In `@docs/claude/dwca-format-reference.md`:
- Around line 94-113: The documentation shows EML 2.2.0 namespace
("https://eml.ecoinformatics.org/eml-2.2.0") but the implementation currently
emits EML 2.1.1 (literal "eml://ecoinformatics.org/eml-2.1.1"); either make the
docs reflect 2.1.1 or update the implementation to 2.2.0 so they match. To fix:
search the codebase for the literal "eml://ecoinformatics.org/eml-2.1.1" (and
any code that constructs the eml:eml root or schemaLocation) and change the
namespace and schemaLocation to the 2.2.0 values used in the doc, or
alternatively update the doc snippet to the 2.1.1 namespace and schemaLocation;
ensure the change is applied to the EML root construction logic (the
function/method that builds the eml:eml element) and the documentation fragment
so both use the same namespace string.

In `@docs/claude/export-framework.md`:
- Line 13: Update the documentation table entry that currently lists
JSONExporter and CSVExporter to also include the new DwCAExporter: edit the row
showing `ami/exports/format_types.py` so it mentions Concrete exporters
`JSONExporter`, `CSVExporter`, and `DwCAExporter` (use the exact class name
DwCAExporter to match the implementation).
- Around line 68-71: Update the example output of
ExportRegistry.get_supported_formats() to include the newly registered "dwca"
format so the snippet accurately reflects current registrations; adjust the
example list returned by ExportRegistry.get_supported_formats() (used alongside
ExportRegistry.register and ExportRegistry.get_exporter in the snippet) to
include "dwca" among the formats (e.g., ["occurrences_api_json",
"occurrences_simple_csv", "dwca"]).

🧹 Nitpick comments (3)

docs/claude/dwca-format-reference.md (1)

9-16: Add a language identifier to the fenced code block.

The linter flags this code block for missing a language specifier (MD040). Since it's a file tree diagram, use ```text.

ami/exports/tests.py (1)

329-341: Each test re-runs the full export pipeline independently.

Every test method calls _run_export(), which creates a DataExport, runs the entire export pipeline, and writes to storage. For 7 tests that call this helper, that's significant overhead. Consider using setUpClass or caching the export result in setUp to run the export once and share the ZIP across structural validation tests.

This is fine for correctness — just a test performance observation.
ami/exports/dwca.py (1)
286-286: Unused events_queryset parameter.

events_queryset is accepted but never used in generate_eml_xml. Either remove it or use it to populate temporal/geographic coverage metadata (the latter is listed as a follow-up). For now, removing the parameter or marking it for future use avoids confusion.
Proposed fix — remove unused parameter
-def generate_eml_xml(project, events_queryset=None) -> str:
+def generate_eml_xml(project) -> str:
This also requires updating the call site in ami/exports/format_types.py (line 258):
-        eml_xml = generate_eml_xml(self.project, events_qs)
+        eml_xml = generate_eml_xml(self.project)

ami/exports/dwca.py

coderabbitai · 2026-02-11T20:40:20Z

ami/exports/dwca.py

+    (
+        DWC + "vernacularName",
+        "vernacularName",
+        lambda o, slug: o.determination.common_name_en or "" if o.determination else "",
+    ),


🛠️ Refactor suggestion | 🟠 Major

vernacularName lambda: add parentheses for clarity (known issue #4).

While Python's operator precedence makes this technically correct (or binds tighter than the ternary), the expression is confusing to read and a common source of bugs. Adding explicit parentheses removes ambiguity.

Proposed fix

( DWC + "vernacularName", "vernacularName", - lambda o, slug: o.determination.common_name_en or "" if o.determination else "", + lambda o, slug: (o.determination.common_name_en or "") if o.determination else "", ),

🧰 Tools

🪛 Ruff (0.15.0)

[warning] 78-78: Unused lambda argument: slug

(ARG005)

🤖 Prompt for AI Agents

In `@ami/exports/dwca.py` around lines 75 - 79, The ternary expression in the tuple entry with key "vernacularName" is ambiguous; update the lambda (lambda o, slug: o.determination.common_name_en or "" if o.determination else "") to add parentheses around the ternary's true branch so it reads like (o.determination.common_name_en or "") if o.determination else "" to make the intent explicit and easier to read.

ami/exports/dwca.py

coderabbitai · 2026-02-11T20:40:21Z

ami/exports/dwca.py

+def _format_duration(event) -> str:
+    """Format event duration as human-readable string."""
+    if not event.start or not event.end:
+        return ""
+    delta = event.end - event.start
+    total_seconds = int(delta.total_seconds())
+    hours, remainder = divmod(total_seconds, 3600)
+    minutes, _ = divmod(remainder, 60)
+    if hours > 0:
+        return f"{hours}h {minutes}m"
+    return f"{minutes}m"


⚠️ Potential issue | 🟡 Minor

Negative durations are not guarded against.

If event.end < event.start (data inconsistency), total_seconds will be negative, and divmod with a negative dividend produces negative results, yielding strings like "-1h -30m". Consider returning an empty string or logging a warning for negative durations.

Proposed fix

def _format_duration(event) -> str: """Format event duration as human-readable string.""" if not event.start or not event.end: return "" delta = event.end - event.start total_seconds = int(delta.total_seconds()) + if total_seconds < 0: + return "" hours, remainder = divmod(total_seconds, 3600) minutes, _ = divmod(remainder, 60)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _format_duration(event) -> str:

"""Format event duration as human-readable string."""

if not event.start or not event.end:

return ""

delta = event.end - event.start

total_seconds = int(delta.total_seconds())

hours, remainder = divmod(total_seconds, 3600)

minutes, _ = divmod(remainder, 60)

if hours > 0:

return f"{hours}h {minutes}m"

return f"{minutes}m"

def _format_duration(event) -> str:

"""Format event duration as human-readable string."""

if not event.start or not event.end:

return ""

delta = event.end - event.start

total_seconds = int(delta.total_seconds())

if total_seconds < 0:

return ""

hours, remainder = divmod(total_seconds, 3600)

minutes, _ = divmod(remainder, 60)

if hours > 0:

return f"{hours}h {minutes}m"

return f"{minutes}m"

🤖 Prompt for AI Agents

In `@ami/exports/dwca.py` around lines 132 - 142, The _format_duration function must guard against negative durations: check event.start and event.end and compute delta = event.end - event.start, then if delta.total_seconds() <= 0 return an empty string (or optionally log a warning) instead of proceeding; update the logic around total_seconds / divmod so negative values are not formatted (use the check on delta.total_seconds() before computing hours/minutes and then format only for positive durations).

coderabbitai · 2026-02-11T20:40:21Z

ami/exports/dwca.py

+
+    project_slug = slugify(project.name)
+    now = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S")


⚠️ Potential issue | 🟡 Minor

Use timezone-aware datetime.

datetime.datetime.now() produces a naive datetime. In a Django project with USE_TZ=True, this is inconsistent and could produce incorrect timestamps. Use Django's timezone.now().

Proposed fix

+ from django.utils import timezone + project_slug = slugify(project.name) - now = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S") + now = timezone.now().strftime("%Y-%m-%dT%H:%M:%S")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

project_slug = slugify(project.name)

now = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

from django.utils import timezone

project_slug = slugify(project.name)

now = timezone.now().strftime("%Y-%m-%dT%H:%M:%S")

🤖 Prompt for AI Agents

In `@ami/exports/dwca.py` around lines 288 - 290, Change the naive timestamp to a timezone-aware one: replace the use of datetime.datetime.now() when assigning now (used alongside project_slug/slugify) with Django's timezone-aware function (timezone.now()), and add the appropriate import from django.utils import timezone at the top of the module so now is timezone-aware when used in exports/dwca.py.

ami/exports/format_types.py

ami/exports/tests.py

coderabbitai · 2026-02-11T20:40:21Z

docs/claude/dwca-format-reference.md

+```xml
+<?xml version="1.0" encoding="UTF-8"?>
+<eml:eml xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0"
+         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+         xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 https://eml.ecoinformatics.org/eml-2.2.0/eml.xsd"
+         packageId="urn:ami:dataset:{project_id}" system="AMI">
+  <dataset>
+    <title>{project.name}</title>
+    <creator>
+      <organizationName>{project.owner or institution}</organizationName>
+    </creator>
+    <abstract>
+      <para>{project.description}</para>
+    </abstract>
+    <intellectualRights>
+      <para>License information here</para>
+    </intellectualRights>
+  </dataset>
+</eml:eml>
+```


⚠️ Potential issue | 🟡 Minor

EML version mismatch between documentation and implementation.

This reference doc shows EML 2.2.0 namespace (https://eml.ecoinformatics.org/eml-2.2.0), but the actual implementation in ami/exports/dwca.py (line 293) uses EML 2.1.1 (eml://ecoinformatics.org/eml-2.1.1). The planning doc acknowledges upgrading to 2.2.0 as a follow-up, but this reference should either match the current implementation or clearly note the discrepancy.

🤖 Prompt for AI Agents

In `@docs/claude/dwca-format-reference.md` around lines 94 - 113, The documentation shows EML 2.2.0 namespace ("https://eml.ecoinformatics.org/eml-2.2.0") but the implementation currently emits EML 2.1.1 (literal "eml://ecoinformatics.org/eml-2.1.1"); either make the docs reflect 2.1.1 or update the implementation to 2.2.0 so they match. To fix: search the codebase for the literal "eml://ecoinformatics.org/eml-2.1.1" (and any code that constructs the eml:eml root or schemaLocation) and change the namespace and schemaLocation to the 2.2.0 values used in the doc, or alternatively update the doc snippet to the 2.1.1 namespace and schemaLocation; ensure the change is applied to the EML root construction logic (the function/method that builds the eml:eml element) and the documentation fragment so both use the same namespace string.

docs/claude/export-framework.md

coderabbitai · 2026-02-11T20:40:21Z

docs/claude/export-framework.md

+ExportRegistry.register("format_name")(ExporterClass)
+ExportRegistry.get_exporter("format_name")  # → ExporterClass
+ExportRegistry.get_supported_formats()       # → ["occurrences_api_json", "occurrences_simple_csv"]
+```


⚠️ Potential issue | 🟡 Minor

Example output of get_supported_formats() is incomplete.

The example on line 70 should include "dwca" to reflect the new registration.

Proposed fix

-ExportRegistry.get_supported_formats() # → ["occurrences_api_json", "occurrences_simple_csv"] +ExportRegistry.get_supported_formats() # → ["occurrences_api_json", "occurrences_simple_csv", "dwca"]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

ExportRegistry.register("format_name")(ExporterClass)

ExportRegistry.get_exporter("format_name") # → ExporterClass

ExportRegistry.get_supported_formats() # → ["occurrences_api_json", "occurrences_simple_csv"]

```

ExportRegistry.register("format_name")(ExporterClass)

ExportRegistry.get_exporter("format_name") # → ExporterClass

ExportRegistry.get_supported_formats() # → ["occurrences_api_json", "occurrences_simple_csv", "dwca"]

🤖 Prompt for AI Agents

In `@docs/claude/export-framework.md` around lines 68 - 71, Update the example output of ExportRegistry.get_supported_formats() to include the newly registered "dwca" format so the snippet accurately reflects current registrations; adjust the example list returned by ExportRegistry.get_supported_formats() (used alongside ExportRegistry.register and ExportRegistry.get_exporter in the snippet) to include "dwca" among the formats (e.g., ["occurrences_api_json", "occurrences_simple_csv", "dwca"]).

Copilot

Pull request overview

Adds a new export format (dwca) to generate a GBIF-compatible Darwin Core Archive (DwC-A) ZIP using an Event core (event.txt) with an Occurrence extension (occurrence.txt), along with generated meta.xml and eml.xml.

Changes:

Introduces ami/exports/dwca.py with DwC field mappings, TSV writing, and meta/eml XML generation.
Adds DwCAExporter to the existing export framework and registers it under "dwca".
Adds a new DwC-A-focused test suite validating ZIP contents and basic structure.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
`ami/exports/dwca.py`	New DwC-A field definitions and archive/metadata generators.
`ami/exports/format_types.py`	Adds `DwCAExporter` implementation to orchestrate TSV generation + ZIP packaging.
`ami/exports/registry.py`	Registers the new `"dwca"` export format.
`ami/exports/tests.py`	Adds end-to-end tests covering ZIP structure, TSV headers/rows, and XML parsing.
`docs/claude/export-framework.md`	New internal technical reference for the export framework.
`docs/claude/dwca-format-reference.md`	New DwC-A format reference documentation.
`.agents/planning/dwca-export-plan.md`	Planning/design notes and known issues list for DwC-A export.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-11T20:40:45Z

docs/claude/export-framework.md

+|------|---------|
+| `ami/exports/base.py` | `BaseExporter` ABC - all exporters inherit from this |
+| `ami/exports/registry.py` | `ExportRegistry` - maps format strings to exporter classes |
+| `ami/exports/format_types.py` | Concrete exporters: `JSONExporter`, `CSVExporter` |


The “Concrete exporters” bullet lists only JSONExporter and CSVExporter, but this PR adds DwCAExporter in ami/exports/format_types.py. Update this reference so the technical doc matches the supported exporters.

Suggested change

| `ami/exports/format_types.py` | Concrete exporters: `JSONExporter`, `CSVExporter` |

| `ami/exports/format_types.py` | Concrete exporters: `JSONExporter`, `CSVExporter`, `DwCAExporter` |

docs/claude/dwca-format-reference.md

Copilot · 2026-02-11T20:40:46Z

ami/exports/format_types.py

+    def get_queryset(self):
+        """Return the occurrence queryset (used by BaseExporter for record count)."""
+        return (
+            Occurrence.objects.valid()  # type: ignore[union-attr]
+            .filter(project=self.project)
+            .select_related(
+                "determination",
+                "event",
+                "deployment",
+            )
+            .with_detections_count()
+            .with_identifications()
+        )


DwC-A occurrence exports can currently include rows with event_id or determination missing, which produces empty coreid/eventID and scientificName values in occurrence.txt (GBIF validator rejects these). Filter the queryset to exclude occurrences with null event and null determination (and adjust the expected counts in tests accordingly).

ami/exports/format_types.py

Copilot · 2026-02-11T20:40:46Z

ami/exports/format_types.py

+            progress_callback=lambda n: self.update_job_progress(n),
+        )
+        logger.info(f"DwC-A: wrote {occ_count} occurrences")
+


Job progress for DwC-A exports is only updated every 500 occurrences; for smaller exports progress may remain at 0% even though the job completes. Consider calling update_job_progress(occ_count) once at the end (guarding against total_records == 0 to avoid division by zero in BaseExporter.update_job_progress()).

Suggested change

# Ensure final progress update, even if the callback was never invoked.

# Guard against total_records == 0 to avoid division by zero in BaseExporter.update_job_progress().

if getattr(self, "total_records", 0):

self.update_job_progress(occ_count)

ami/exports/tests.py

ami/exports/format_types.py

Copilot · 2026-02-11T20:40:48Z

ami/exports/dwca.py

+    # Column 0 is the id (eventID)
+    id_elem = ET.SubElement(core, "id")
+    id_elem.set("index", "0")
+
+    for i, (term_uri, header, _) in enumerate(event_fields):
+        if i == 0:
+            continue  # Already declared as <id>
+        field = ET.SubElement(core, "field")
+        field.set("index", str(i))
+        field.set("term", term_uri)
+


In meta.xml generation, column 0 is declared as <id index="0"/> but the corresponding <field index="0" term="...eventID"/> mapping is skipped. Many DwC-A consumers/validators expect the ID column to also be mapped to the Darwin Core term (as in the DwC Text Guide examples). Include a <field> mapping for index 0 instead of skipping it.

ami/exports/dwca.py

ami/exports/format_types.py

- Guard taxonRank against None rank (AttributeError on .lower()) - Guard _get_rank_from_parents against None det_rank - Fix vernacularName ternary precedence with explicit parentheses - Change individualCount to emit "1" (each occurrence = 1 individual, not detections_count which counts bounding boxes) - Guard _format_duration against negative durations Co-Authored-By: Claude <noreply@anthropic.com>

- Filter out occurrences with null event or determination from DwC-A export queryset (GBIF rejects empty coreid/scientificName) - Replace project.owner.email with project.owner.name in EML creator element to avoid leaking PII in downloadable archives - Only emit individualName when owner has a name set Co-Authored-By: Claude <noreply@anthropic.com>

- Wrap DwC-A export in try/finally to clean up intermediate temp files - Use timezone.now() instead of naive datetime.datetime.now() - Use full EML schemaLocation URL for GBIF validation compatibility - Remove unused events_queryset parameter from generate_eml_xml - Simplify progress_callback lambda to direct method reference Co-Authored-By: Claude <noreply@anthropic.com>

- Update occurrence row count test to match null-filtered queryset - Assert parents_json is populated in taxonomy hierarchy test - Use settings.MEDIA_URL instead of hardcoded "/media/" in test helper - Add DwCAExporter to export-framework.md file table and registry example - Fix EML version in dwca-format-reference.md to match implementation (2.1.1) Co-Authored-By: Claude <noreply@anthropic.com>

…pdate - Map all columns (including index 0) to DwC term URIs in meta.xml so GBIF validators can resolve both <id>/<coreid> and <field> entries - Change fieldsEnclosedBy from empty string to double-quote character for stricter parser compatibility - Add final progress update after TSV writing so small exports (<500 records) report completion instead of staying at 0% Co-Authored-By: Claude <noreply@anthropic.com>

Run the export pipeline once in setUpClass and share the ZIP across all structural validation tests instead of re-running per test method. This reduces test time from ~7 export runs to 1. Co-Authored-By: Claude <noreply@anthropic.com>

…occurrences in DwC-A Remove get_filter_backends() override that returned [], allowing DwCAExporter to inherit BaseExporter's OccurrenceCollectionFilter. Update get_events_queryset() to derive events from self.queryset instead of fetching all project events, preventing orphaned events when collection_id filtering is active. Co-Authored-By: Claude <noreply@anthropic.com>

…ssertion Add test_dwca_export_with_collection_filter that verifies filtered exports produce correct occurrence/event counts and referential integrity. Update test_event_headers_and_row_count to expect events derived from occurrences rather than all project events. Co-Authored-By: Claude <noreply@anthropic.com>

Co-Authored-By: Claude <noreply@anthropic.com>

Add API methods, request/response format, filter system, DwC-A specifics, job integration, and file lifecycle details. Remove separate export-system.md. Co-Authored-By: Claude <noreply@anthropic.com>

mihow and others added 4 commits February 11, 2026 11:45

docs: add feature context and roadmap to DwC-A export plan

928d9fc

Co-Authored-By: Claude <noreply@anthropic.com>

docs: add review findings and follow-up roadmap to DwC-A plan

dd2309e

Co-Authored-By: Claude <noreply@anthropic.com>

mihow marked this pull request as ready for review February 11, 2026 20:34

Copilot AI review requested due to automatic review settings February 11, 2026 20:34

Copilot started reviewing on behalf of mihow February 11, 2026 20:34 View session

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

Copilot AI reviewed Feb 11, 2026

View reviewed changes

mihow and others added 2 commits February 11, 2026 13:33

mihow marked this pull request as draft February 11, 2026 21:51

mihow and others added 8 commits February 11, 2026 15:26

docs(exports): add API and operations reference for export system

e14139d

Co-Authored-By: Claude <noreply@anthropic.com>

docs(exports): merge API reference into export-framework.md

c8aadb7

Add API methods, request/response format, filter system, DwC-A specifics, job integration, and file lifecycle details. Remove separate export-system.md. Co-Authored-By: Claude <noreply@anthropic.com>


		project_slug = slugify(project.name)
		now = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S")

	\| `ami/exports/format_types.py` \| Concrete exporters: `JSONExporter`, `CSVExporter` \|
	\| `ami/exports/format_types.py` \| Concrete exporters: `JSONExporter`, `CSVExporter`, `DwCAExporter` \|

+        # Ensure final progress update, even if the callback was never invoked.
+        # Guard against total_records == 0 to avoid division by zero in BaseExporter.update_job_progress().
+        if getattr(self, "total_records", 0):
+            self.update_job_progress(occ_count)

Conversation

mihow commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design decisions

Changes

Test plan

Known issues to fix before merge

Near follow-up (before real GBIF submission)

Eventual follow-up

Summary by CodeRabbit

Uh oh!

netlify bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-preview canceled.

Uh oh!

netlify bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-ssec canceled.

Uh oh!

coderabbitai bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

mihow commented Feb 11, 2026 •

edited

Loading

netlify bot commented Feb 11, 2026 •

edited

Loading

netlify bot commented Feb 11, 2026 •

edited

Loading

coderabbitai bot commented Feb 11, 2026 •

edited

Loading