From c45a4dc5559106536f18e7332fe8d0e242591127 Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 11:45:05 -0800 Subject: [PATCH 01/15] docs: add DwC-A export plan and technical references Add planning document for Darwin Core Archive export format, export framework technical reference, DwC-A format reference with field mappings, and downloaded DwC terms quick reference from TDWG. Co-Authored-By: Claude --- .agents/planning/dwca-export-plan.md | 161 + docs/claude/dwc-terms-reference.md | 4062 ++++++++++++++++++++++++++ docs/claude/dwca-format-reference.md | 179 ++ docs/claude/export-framework.md | 112 + 4 files changed, 4514 insertions(+) create mode 100644 .agents/planning/dwca-export-plan.md create mode 100644 docs/claude/dwc-terms-reference.md create mode 100644 docs/claude/dwca-format-reference.md create mode 100644 docs/claude/export-framework.md diff --git a/.agents/planning/dwca-export-plan.md b/.agents/planning/dwca-export-plan.md new file mode 100644 index 000000000..3fbaed15b --- /dev/null +++ b/.agents/planning/dwca-export-plan.md @@ -0,0 +1,161 @@ +# Plan: Add DwC-A (Darwin Core Archive) Export Format + +## Context + +The project needs to export biodiversity data as Darwin Core Archives for sharing with GBIF and other aggregators. The export framework already exists (`ami/exports/`) with JSON and CSV formats registered. We need to add a new DwC-A exporter that produces a ZIP containing event.txt (core), occurrence.txt (extension), meta.xml, and eml.xml. + +**Decisions made:** +- Event-core architecture (events as core, occurrences as extension) +- URN format for IDs: `urn:ami:event:{project_slug}:{id}`, `urn:ami:occurrence:{project_slug}:{id}` +- Coordinates from Deployment lat/lon only (text locality fields deferred) +- `basisOfRecord` = `"MachineObservation"` for all records + +## Implementation Steps + +### Step 1: Create DwC-A exporter class + +**File:** `ami/exports/format_types.py` (add to existing file) + +Create `DwCAExporter(BaseExporter)` with: +- `file_format = "zip"` +- `export()` method that orchestrates the full pipeline: + 1. Write `event.txt` (tab-delimited) from Event queryset + 2. Write `occurrence.txt` (tab-delimited) from Occurrence queryset + 3. Generate `meta.xml` + 4. Generate `eml.xml` + 5. Package all into a ZIP, return temp file path + +**Querysets:** +- Events: `Event.objects.filter(project=self.project)` with `select_related('deployment', 'deployment__research_site')` +- Occurrences: `Occurrence.objects.valid().filter(project=self.project)` with `select_related('determination', 'event', 'deployment')` and `.with_timestamps().with_detections_count()` + +**Override `get_filter_backends()`** to return backends appropriate for events+occurrences (or empty list if collection filtering doesn't apply to events). + +### Step 2: Define DwC field mappings + +**File:** `ami/exports/dwca.py` (new file) + +Contains: +- `EVENT_FIELDS`: ordered list of `(dwc_term_uri, header_name, getter_function)` tuples +- `OCCURRENCE_FIELDS`: same structure +- Helper functions to extract taxonomy hierarchy from `determination.parents_json` (walk the `list[TaxonParent]` for kingdom, phylum, class, order, family, genus) +- `get_specific_epithet(name)` - split binomial to get second word +- `generate_meta_xml(event_fields, occurrence_fields, event_filename, occurrence_filename)` - builds the XML string +- `generate_eml_xml(project, events_queryset)` - builds minimal EML metadata from project info + +**Event field mapping (event.txt):** + +| Column | DwC Term | Source | +|--------|----------|--------| +| 0 | eventID | `urn:ami:event:{project_slug}:{event.id}` | +| 1 | eventDate | `event.start`/`event.end` as ISO date interval | +| 2 | eventTime | time portion of `event.start` | +| 3 | year | from `event.start` | +| 4 | month | from `event.start` | +| 5 | day | from `event.start` | +| 6 | samplingProtocol | `"automated light trap with camera"` (constant, could be project-level setting later) | +| 7 | sampleSizeValue | `event.captures_count` | +| 8 | sampleSizeUnit | `"images"` | +| 9 | samplingEffort | duration formatted | +| 10 | locationID | `deployment.name` | +| 11 | decimalLatitude | `deployment.latitude` | +| 12 | decimalLongitude | `deployment.longitude` | +| 13 | geodeticDatum | `"WGS84"` | +| 14 | datasetName | `project.name` | +| 15 | modified | `event.updated_at` ISO format | + +**Occurrence field mapping (occurrence.txt):** + +| Column | DwC Term | Source | +|--------|----------|--------| +| 0 | eventID | same URN as core (foreign key) | +| 1 | occurrenceID | `urn:ami:occurrence:{project_slug}:{occurrence.id}` | +| 2 | basisOfRecord | `"MachineObservation"` | +| 3 | occurrenceStatus | `"present"` | +| 4 | scientificName | `determination.name` | +| 5 | taxonRank | `determination.rank` (lowercase) | +| 6 | kingdom | from `determination.parents_json` | +| 7 | phylum | from `determination.parents_json` | +| 8 | class | from `determination.parents_json` | +| 9 | order | from `determination.parents_json` | +| 10 | family | from `determination.parents_json` | +| 11 | genus | from `determination.parents_json` | +| 12 | specificEpithet | second word of species name | +| 13 | vernacularName | `determination.common_name_en` | +| 14 | taxonID | `determination.gbif_taxon_key` (if available) | +| 15 | individualCount | `detections_count` | +| 16 | identificationVerificationStatus | "verified" if identifications exist, else "unverified" | +| 17 | modified | `occurrence.updated_at` ISO format | + +### Step 3: Register the exporter + +**File:** `ami/exports/registry.py` + +Add: `ExportRegistry.register("dwca")(DwCAExporter)` + +This is all that's needed for it to appear in the API's valid format choices. + +### Step 4: Override `generate_filename()` behavior + +The `DataExport.generate_filename()` uses `exporter.file_format` for the extension. Since `file_format = "zip"`, the filename will be `{project_slug}_export-{pk}.zip` which is correct. + +No changes needed to `DataExport` model. + +### Step 5: Write tests + +**File:** `ami/exports/tests.py` (add to existing) + +- Test that `DwCAExporter` is registered and retrievable +- Test that export produces a valid ZIP with expected files (event.txt, occurrence.txt, meta.xml, eml.xml) +- Test that event.txt has correct headers and row count matches events +- Test that occurrence.txt has correct headers and row count matches valid occurrences +- Test that meta.xml is valid XML with correct core/extension structure +- Test that all occurrence eventIDs reference existing event eventIDs (referential integrity) +- Test taxonomy hierarchy extraction from `parents_json` + +### Step 6: Update documentation + +**File:** `docs/claude/dwca-format-reference.md` (already created, update with final field mappings) + +## Key Files to Modify + +| File | Action | +|------|--------| +| `ami/exports/dwca.py` | **New** - DwC field mappings, meta.xml/eml.xml generators, taxonomy helpers | +| `ami/exports/format_types.py` | **Modify** - Add `DwCAExporter` class | +| `ami/exports/registry.py` | **Modify** - Register `"dwca"` format | +| `ami/exports/tests.py` | **Modify** - Add DwC-A tests | + +## Key Files to Read (not modify) + +| File | Why | +|------|-----| +| `ami/exports/base.py` | BaseExporter interface | +| `ami/exports/models.py` | DataExport model, run_export() flow | +| `ami/exports/utils.py` | get_data_in_batches(), generate_fake_request() | +| `ami/main/models.py:1025` | Event model fields | +| `ami/main/models.py:2808` | Occurrence model fields | +| `ami/main/models.py:3329` | TaxonParent pydantic model (parents_json schema) | +| `ami/main/models.py:3349` | Taxon model fields | +| `docs/claude/reference/example_dwca_exporter.md` | Reference DwC-A implementation | + +## Design Decisions + +1. **No DRF serializer for DwC-A** - Unlike JSON/CSV exporters that use DRF serializers via `get_data_in_batches()`, the DwC-A exporter writes TSV directly. DwC fields are simple extractions, not nested API representations. This avoids the overhead of serializer instantiation per record. + +2. **Direct queryset iteration** - Use `queryset.iterator(chunk_size=500)` for memory efficiency, writing rows directly to the TSV file. + +3. **Taxonomy from parents_json** - Walk the `parents_json` list (which contains `{id, name, rank}` dicts) to extract kingdom/phylum/class/order/family/genus. This avoids N+1 queries on the Taxon parent chain. + +4. **meta.xml generated from field definitions** - The same field list used for writing TSV columns also drives meta.xml generation, ensuring they stay in sync. + +5. **Minimal eml.xml** - Start with project name, description, and owner. Can be enriched later with geographic bounding box, temporal coverage, etc. + +6. **Scope for follow-up** - Species checklist (taxon.txt) and multimedia extension (multimedia.txt) are explicitly out of scope for this PR, as stated in the task. + +## Verification + +1. Run existing export tests to ensure no regression: `docker compose run --rm django python manage.py test ami.exports` +2. Run new DwC-A tests +3. Manual test: create a DwC-A export via the API or admin, download the ZIP, inspect contents +4. Validate with GBIF Data Validator: https://www.gbif.org/tools/data-validator diff --git a/docs/claude/dwc-terms-reference.md b/docs/claude/dwc-terms-reference.md new file mode 100644 index 000000000..09c112f09 --- /dev/null +++ b/docs/claude/dwc-terms-reference.md @@ -0,0 +1,4062 @@ + +# Darwin Core Quick Reference Guide + +This document is intended to be an easy-to-read reference of the currently (as of 2023-09-18) recommended terms maintained as part of the [Darwin Core standard](https://www.tdwg.org/standards/dwc/) and is maintained by the [Darwin Core Maintenance Group](https://www.tdwg.org/community/dwc/). + +**Need help?** Read more about how to use Darwin Core in the [Darwin Core Questions & Answers site](https://github.com/tdwg/dwc-qa/blob/master/README.md). Still have questions? Submit a new issue (question/problem) to the [dwc-qa issues page in GitHub](https://github.com/tdwg/dwc-qa/issues), or use the [form](https://tinyurl.com/darwin-qa). See the bottom of this document for [how to cite Darwin Core](https://dwc.tdwg.org/terms/#cite-darwin-core)." + +**Want to contribute?** For information about how to contribute to the Darwin Core Standard, including how to propose changes, see the [Guidelines for contributing](https://github.com/tdwg/dwc/blob/master/.github/CONTRIBUTING.md). + +This page is not part of the standard, but combines the normative term names and definitions with the non-normative comments and examples that are meant to help people to use the terms consistently. Definitions, comments, and examples may include namespace abbreviations (e.g., "dwc:"). These are included to show that the meaning for the word it is attached to very specifically means the term as defined in that namespace. Thus, dwc:Event means Event as defined by Darwin Core at https://dwc.tdwg.org/terms/#event. Capitalized terms that follow a namespace abbreviation, such as dwc:Occurrence, are Darwin Core class terms, which are a special category of terms used to group sets of property terms (terms that being with lower case names that follow the namespace abbreviation, e.g., dwc:eventID) for convenience. Comprehensive metadata for current and obsolete terms in human readable form are found in the document [List of Darwin Core terms](../list/). + +Additional [files with just the current term names](https://github.com/tdwg/dwc/tree/master/dist) and a [file with the full term history](https://github.com/tdwg/dwc/blob/master/vocabulary/term_versions.csv) can be found in the [Darwin Core repository](https://github.com/tdwg/dwc). + + +## Record-level + +This category contains terms that are generic in that they might apply to any type of record in a dataset. + + + + + + + + + + + + +
type
Identifierhttp://purl.org/dc/elements/1.1/type
DefinitionThe nature or genre of the resource.
CommentsMust be populated with a value from the DCMI type vocabulary (https://www.dublincore.org/specifications/dublin-core/dcmi-type-vocabulary/2010-10-11/).
Examples
  • StillImage
  • MovingImage
  • Sound
  • PhysicalObject
  • Event
  • Text
+ + + + + + + + + +
modified
Identifierhttp://purl.org/dc/terms/modified
DefinitionDate on which the resource was changed.
CommentsRecommended best practice is to use a date that conforms to ISO 8601-1:2019.
Examples
  • 1963-03-08T14:07-06:00 (8 Mar 1963 at or after 2:07pm and before 2:08pm in the time zone six hours earlier than UTC)
  • 2009-02-20T08:40Z (20 February 2009 at or after 8:40am and before 8:41 UTC)
  • 2018-08-29T15:19 (29 August 2018 at or after 3:19pm and before 3:20pm local time)
  • 1809-02-12 (within the day 12 February 1809)
  • 1906-06 (in the month of June 1906)
  • 1971 (in the year 1971)
  • 2007-03-01T13:00:00Z/2008-05-11T15:30:00Z (some time within the interval beginning 1 March 2007 at 1pm UTC and before 11 May 2008 at 3:30pm UTC)
  • 1900/1909 (some time within the interval between the beginning of the year 1900 and before the year 1909)
  • 2007-11-13/15 (some time in the interval between the beginning of 13 November 2007 and before 15 November 2007)
+ + + + + + + + + +
language
Identifierhttp://purl.org/dc/elements/1.1/language
DefinitionA language of the resource.
CommentsRecommended best practice is to use a controlled vocabulary such as RFC 5646. This term has an equivalent in the dcterms: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • en (for English)
  • es (for Spanish)
+ + + + + + + + + +
license
Identifierhttp://purl.org/dc/terms/license
DefinitionA legal document giving official permission to do something with the resource.
Comments
Examples
+ + + + + + + + + +
rightsHolder
Identifierhttp://purl.org/dc/terms/rightsHolder
DefinitionA person or organization owning or managing rights over the resource.
Comments
ExamplesThe Regents of the University of California
+ + + + + + + + + +
accessRights
Identifierhttp://purl.org/dc/terms/accessRights
DefinitionInformation about who can access the resource or an indication of its security status.
CommentsAccess Rights may include information regarding access or restrictions based on privacy, security, or other policies.
Examples
+ + + + + + + + + +
bibliographicCitation
Identifierhttp://purl.org/dc/terms/bibliographicCitation
DefinitionA bibliographic reference for the resource.
CommentsFrom Dublin Core, "Recommended practice is to include sufficient bibliographic detail to identify the resource as unambiguously as possible." The intended usage of this term in Darwin Core is to provide the preferred way to cite the resource itself - "how to cite this record". Note that the intended usage of dcterms:references in Darwin Core, by contrast, is to point to the definitive source representation of the resource - "where to find the as-close-to-original reference", if one is available.
Examples
+ + + + + + + + + +
references
Identifierhttp://purl.org/dc/terms/references
DefinitionA related resource that is referenced, cited, or otherwise pointed to by the described resource.
CommentsFrom Dublin Core, "This property is intended to be used with non-literal values. This property is an inverse property of Is Referenced By." The intended usage of this term in Darwin Core is to point to the definitive source representation of the resource (e.g.,dwc:Taxon, dwc:Occurrence, dwc:Event), if one is available. Note that the intended usage of dcterms:bibliographicCitation in Darwin Core, by contrast, is to provide the preferred way to cite the resource itself.
Examples
+ + + + + + + + + +
feedbackURL
Identifierhttp://rs.tdwg.org/dwc/terms/feedbackURL
DefinitionA uniform resource locator (URL) that points to a webpage on which a form may be submitted to gather feedback about the record.
CommentsRecommended best practice is to optionally include query strings that act to pre-populate web page form elements and communicate the context.
Exampleshttps://example.com/new?title=New+issue&body=This+comment+is+about+CAN12345
+ + + + + + + + + +
institutionID
Identifierhttp://rs.tdwg.org/dwc/terms/institutionID
DefinitionAn identifier for the institution having custody of the object(s) or information referred to in the record.
CommentsFor physical specimens, the recommended best practice is to use a globally unique and resolvable identifier from a collections registry such as the Research Organization Registry (ROR) or the Global Registry of Scientific Collections (https://scientific-collections.gbif.org/)
Examples
+ + + + + + + + + +
collectionID
Identifierhttp://rs.tdwg.org/dwc/terms/collectionID
DefinitionAn identifier for the collection or dataset from which the record was derived.
CommentsFor physical specimens, the recommended best practice is to use a globally unique and resolvable identifier from a collections registry such as the Global Registry of Scientific Collections (https://scientific-collections.gbif.org/).
Exampleshttps://scientific-collections.gbif.org/collection/fbd3ed74-5a21-4e01-b86a-33d36f032d9c
+ + + + + + + + + +
datasetID
Identifierhttp://rs.tdwg.org/dwc/terms/datasetID
DefinitionAn identifier for the set of data. May be a global unique identifier or an identifier specific to a collection or institution.
Comments
Examplesb15d4952-7d20-46f1-8a3e-556a512b04c5
+ + + + + + + + + +
institutionCode
Identifierhttp://rs.tdwg.org/dwc/terms/institutionCode
DefinitionThe name (or acronym) in use by the institution having custody of the object(s) or information referred to in the record.
Comments
Examples
  • MVZ
  • FMNH
  • CLO
  • UCMP
+ + + + + + + + + +
collectionCode
Identifierhttp://rs.tdwg.org/dwc/terms/collectionCode
DefinitionThe name, acronym, coden, or initialism identifying the collection or data set from which the record was derived.
Comments
Examples
  • Mammals
  • Hildebrandt
  • EBIRD
  • VP
+ + + + + + + + + +
datasetName
Identifierhttp://rs.tdwg.org/dwc/terms/datasetName
DefinitionThe name identifying the data set from which the record was derived.
Comments
Examples
  • Grinnell Resurvey Mammals
  • Lacey Ctenomys Recaptures
+ + + + + + + + + +
ownerInstitutionCode
Identifierhttp://rs.tdwg.org/dwc/terms/ownerInstitutionCode
DefinitionThe name (or acronym) in use by the institution having ownership of the object(s) or information referred to in the record.
Comments
Examples
  • NPS
  • APN
  • InBio
+ + + + + + + + + +
basisOfRecord
Identifierhttp://rs.tdwg.org/dwc/terms/basisOfRecord
DefinitionThe specific nature of the data record.
CommentsRecommended best practice is to use a controlled vocabulary such as the set of local names of the identifiers for classes in Darwin Core.
Examples
  • MaterialEntity
  • PreservedSpecimen
  • FossilSpecimen
  • LivingSpecimen
  • MaterialSample
  • Event
  • HumanObservation
  • MachineObservation
  • Taxon
  • Occurrence
  • MaterialCitation
+ + + + + + + + + +
informationWithheld
Identifierhttp://rs.tdwg.org/dwc/terms/informationWithheld
DefinitionAdditional information that exists, but that has not been shared in the given record.
CommentsThis term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • location information not given for endangered species
  • collector identities withheld | ask about tissue samples
+ + + + + + + + + +
dataGeneralizations
Identifierhttp://rs.tdwg.org/dwc/terms/dataGeneralizations
DefinitionActions taken to make the shared data less specific or complete than in its original form. Suggests that alternative data of higher quality may be available on request.
CommentsThis term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
ExamplesCoordinates generalized from original GPS coordinates to the nearest half degree grid cell.
+ + + + + + + + + +
dynamicProperties
Identifierhttp://rs.tdwg.org/dwc/terms/dynamicProperties
DefinitionA list of additional measurements, facts, characteristics, or assertions about the record. Meant to provide a mechanism for structured content.
CommentsRecommended best practice is to use a key:value encoding schema for a data interchange format such as JSON.
Examples
  • {"heightInMeters":1.5}
  • {"targusLengthInMeters":0.014, "weightInGrams":120}
  • {"natureOfID":"expert identification", "identificationEvidence":"cytochrome B sequence"}
  • {"relativeHumidity":28, "airTemperatureInCelsius":22, "sampleSizeInKilograms":10}
  • {"aspectHeading":277, "slopeInDegrees":6}
  • {"iucnStatus":"vulnerable", "taxonDistribution":"Neuquén, Argentina"}
+ + +## Occurrence + + + + + + + + + + + +
Occurrence Class
Identifierhttp://rs.tdwg.org/dwc/terms/Occurrence
DefinitionAn existence of a dwc:Organism at a particular place at a particular time.
Comments
Examples
  • a wolf pack on the shore of Kluane Lake in 1988
  • a virus in a plant leaf in the New York Botanical Garden at 15:29 on 2014-10-23
  • a fungus in Central Park in the summer of 1929
+ + + + + + + + + + +
occurrenceID
Identifierhttp://rs.tdwg.org/dwc/terms/occurrenceID
DefinitionAn identifier for the dwc:Occurrence (as opposed to a particular digital record of the dwc:Occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the dwc:occurrenceID globally unique.
CommentsRecommended best practice is to use a persistent, globally unique identifier.
Examples
+ + + + + + + + + +
catalogNumber
Identifierhttp://rs.tdwg.org/dwc/terms/catalogNumber
DefinitionAn identifier (preferably unique) for the record within the data set or collection.
Comments
Examples
  • 145732
  • 145732a
  • 2008.1334
  • R-4313
+ + + + + + + + + +
recordNumber
Identifierhttp://rs.tdwg.org/dwc/terms/recordNumber
DefinitionAn identifier given to the dwc:Occurrence at the time it was recorded. Often serves as a link between field notes and a dwc:Occurrence record, such as a specimen collector's number.
CommentsThis term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
ExamplesOPP 7101
+ + + + + + + + + +
recordedBy
Identifierhttp://rs.tdwg.org/dwc/terms/recordedBy
DefinitionA list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original dwc:Occurrence. The primary collector or observer, especially one who applies a personal identifier (dwc:recordNumber), should be listed first.
CommentsRecommended best practice is to separate the values in a list with space vertical bar space ( | ). This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • José E. Crespo
  • Oliver P. Pearson | Anita K. Pearson (where the value in recordNumber OPP 7101 corresponds to the collector number for the specimen in the field catalog of Oliver P. Pearson)
+ + + + + + + + + +
recordedByID
Identifierhttp://rs.tdwg.org/dwc/terms/recordedByID
DefinitionA list (concatenated and separated) of the globally unique identifier for the person, people, groups, or organizations responsible for recording the original dwc:Occurrence.
CommentsRecommended best practice is to provide a single identifier that disambiguates the details of the identifying agent. If a list is used, it is recommended to separate the values in the list with space vertical bar space ( | ). The order of the identifiers on any list for this term can not be guaranteed to convey any semantics.
Examples
+ + + + + + + + + +
individualCount
Identifierhttp://rs.tdwg.org/dwc/terms/individualCount
DefinitionThe number of individuals present at the time of the dwc:Occurrence.
Comments
Examples
  • 0
  • 1
  • 25
+ + + + + + + + + +
organismQuantity
Identifierhttp://rs.tdwg.org/dwc/terms/organismQuantity
DefinitionA number or enumeration value for the quantity of dwc:Organisms.
CommentsA dwc:organismQuantity must have a corresponding dwc:organismQuantityType.
Examples
  • 27 (organismQuantity) with individuals (organismQuantityType)
  • 12.5 (organismQuantity) with % biomass (organismQuantityType)
  • r (organismQuantity) with Braun-Blanquet Scale (organismQuantityType)
  • many (organismQuantity) with individuals (organismQuantityType)
+ + + + + + + + + +
organismQuantityType
Identifierhttp://rs.tdwg.org/dwc/terms/organismQuantityType
DefinitionThe type of quantification system used for the quantity of dwc:Organisms.
CommentsA dwc:organismQuantityType must have a corresponding dwc:organismQuantity. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • 27 (organismQuantity) with individuals (organismQuantityType)
  • 12.5 (organismQuantity) with % biomass (organismQuantityType)
  • r (organismQuantity) with Braun-Blanquet Scale (organismQuantityType)
  • many (organismQuantity) with individuals (organismQuantityType)
+ + + + + + + + + +
sex
Identifierhttp://rs.tdwg.org/dwc/terms/sex
DefinitionThe sex of the biological individual(s) represented in the dwc:Occurrence.
CommentsRecommended best practice is to use a controlled vocabulary. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • female
  • male
  • hermaphrodite
+ + + + + + + + + +
lifeStage
Identifierhttp://rs.tdwg.org/dwc/terms/lifeStage
DefinitionThe age class or life stage of the dwc:Organism(s) at the time the dwc:Occurrence was recorded.
CommentsRecommended best practice is to use a controlled vocabulary. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • zygote
  • larva
  • juvenile
  • adult
  • seedling
  • flowering
  • fruiting
+ + + + + + + + + +
reproductiveCondition
Identifierhttp://rs.tdwg.org/dwc/terms/reproductiveCondition
DefinitionThe reproductive condition of the biological individual(s) represented in the dwc:Occurrence.
CommentsRecommended best practice is to use a controlled vocabulary. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • non-reproductive
  • pregnant
  • in bloom
  • fruit-bearing
+ + + + + + + + + +
caste
Identifierhttp://rs.tdwg.org/dwc/terms/caste
DefinitionCategorisation of individuals for eusocial species (including some mammals and arthropods).
CommentsRecommended best practice is to use a controlled vocabulary that aligns best with the dwc:Taxon. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • queen
  • male alate
  • intercaste
  • minor worker
  • soldier
  • ergatoid
+ + + + + + + + + +
behavior
Identifierhttp://rs.tdwg.org/dwc/terms/behavior
DefinitionThe behavior shown by the subject at the time the dwc:Occurrence was recorded.
CommentsThis term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • roosting
  • foraging
  • running
+ + + + + + + + + +
vitality
Identifierhttp://rs.tdwg.org/dwc/terms/vitality
DefinitionAn indication of whether a dwc:Organism was alive or dead at the time of collection or observation.
CommentsRecommended best practice is to use a controlled vocabulary. Intended to be used with records having a dwc:basisOfRecord of PreservedSpecimen, MaterialEntity, MaterialSample, or HumanObservation. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • alive
  • dead
  • mixedLot
  • uncertain
  • notAssessed
+ + + + + + + + + +
establishmentMeans
Identifierhttp://rs.tdwg.org/dwc/terms/establishmentMeans
DefinitionStatement about whether a dwc:Organism has been introduced to a given place and time through the direct or indirect activity of modern humans.
CommentsRecommended best practice is to use controlled value strings from the controlled vocabulary designated for use with this term, listed at http://rs.tdwg.org/dwc/doc/em/. For details, refer to https://doi.org/10.3897/biss.3.38084. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • native
  • nativeReintroduced
  • introduced
  • introducedAssistedColonisation
  • vagrant
  • uncertain
+ + + + + + + + + +
degreeOfEstablishment
Identifierhttp://rs.tdwg.org/dwc/terms/degreeOfEstablishment
DefinitionThe degree to which a dwc:Organism survives, reproduces, and expands its range at the given place and time.
CommentsRecommended best practice is to use controlled value strings from the controlled vocabulary designated for use with this term, listed at http://rs.tdwg.org/dwc/doc/doe/. For details, refer to https://doi.org/10.3897/biss.3.38084. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • native
  • captive
  • cultivated
  • released
  • failing
  • casual
  • reproducing
  • established
  • colonising
  • invasive
  • widespreadInvasive
+ + + + + + + + + +
pathway
Identifierhttp://rs.tdwg.org/dwc/terms/pathway
DefinitionThe process by which a dwc:Organism came to be in a given place at a given time.
CommentsRecommended best practice is to use controlled value strings from the controlled vocabulary designated for use with this term, listed at http://rs.tdwg.org/dwc/doc/pw/. For details, refer to https://doi.org/10.3897/biss.3.38084. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • releasedForUse
  • otherEscape
  • transportContaminant
  • transportStowaway
  • corridor
  • unaided
+ + + + + + + + + +
georeferenceVerificationStatus
Identifierhttp://rs.tdwg.org/dwc/terms/georeferenceVerificationStatus
DefinitionA categorical description of the extent to which the georeference has been verified to represent the best possible spatial description for the dcterms:Location of the dwc:Occurrence.
CommentsRecommended best practice is to use a controlled vocabulary. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • unable to georeference
  • requires georeference
  • requires verification
  • verified by data custodian
  • verified by contributor
+ + + + + + + + + +
occurrenceStatus
Identifierhttp://rs.tdwg.org/dwc/terms/occurrenceStatus
DefinitionA statement about the presence or absence of a dwc:Taxon at a dcterms:Location.
CommentsFor dwc:Occurrences, the default vocabulary is recommended to consist of present and absent, but can be extended by implementers with good justification. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • present
  • absent
+ + + + + + + + + +
associatedMedia
Identifierhttp://rs.tdwg.org/dwc/terms/associatedMedia
DefinitionA list (concatenated and separated) of identifiers (publication, global unique identifier, URI) of media associated with the dwc:Occurrence.
Comments
Exampleshttps://arctos.database.museum/media/10520962 | https://arctos.database.museum/media/10520964
+ + + + + + + + + +
associatedOccurrences
Identifierhttp://rs.tdwg.org/dwc/terms/associatedOccurrences
DefinitionA list (concatenated and separated) of identifiers of other dwc:Occurrence records and their associations to this dwc:Occurrence.
CommentsThis term can be used to provide a list of associations to other dwc:Occurrences. Note that the dwc:ResourceRelationship class is an alternative means of representing associations, and with more detail. Recommended best practice is to separate the values in a list with space vertical bar space ( | ).
Examples
+ + + + + + + + + +
associatedReferences
Identifierhttp://rs.tdwg.org/dwc/terms/associatedReferences
DefinitionA list (concatenated and separated) of identifiers (publication, bibliographic reference, global unique identifier, URI) of literature associated with the dwc:Occurrence.
CommentsRecommended best practice is to separate the values in a list with space vertical bar space ( | ). Note that the dwc:ResourceRelationship class is an alternative means of representing associations, and with more detail. Note also that the intended usage of the term dcterms:references in Darwin Core when applied to a dwc:Occurrence is to point to the definitive source representation of that dwc:Occurrence if one is available. Note also that the intended usage of dcterms:bibliographicCitation in Darwin Core when applied to a dwc:Occurrence is to provide the preferred way to cite the dwc:Occurrence itself.
Examples
  • http://www.sciencemag.org/cgi/content/abstract/322/5899/261
  • Christopher J. Conroy, Jennifer L. Neuwald. 2008. Phylogeographic study of the California vole, Microtus californicus Journal of Mammalogy, 89(3):755-767.
  • Steven R. Hoofer and Ronald A. Van Den Bussche. 2001. Phylogenetic Relationships of Plecotine Bats and Allies Based on Mitochondrial Ribosomal Sequences. Journal of Mammalogy 82(1):131-137. | Walker, Faith M., Jeffrey T. Foster, Kevin P. Drees, Carol L. Chambers. 2014. Spotted bat (Euderma maculatum) microsatellite discovery using illumina sequencing. Conservation Genetics Resources.
+ + + + + + + + + +
associatedTaxa
Identifierhttp://rs.tdwg.org/dwc/terms/associatedTaxa
DefinitionA list (concatenated and separated) of identifiers or names of dwc:Taxon records and the associations of this dwc:Occurrence to each of them.
CommentsThis term can be used to provide a list of associations to dwc:Taxon records other than the one defined in the dwc:Occurrence. Note that the dwc:ResourceRelationship class is an alternative means of representing associations, and with more detail. This term is not apt for establishing relationships between dwc:Taxon records, only between specific dwc:Occurrences of a dwc:Organism with other dwc:Taxon records. Recommended best practice is to separate the values in a list with space vertical bar space ( | ).
Examples
  • "host":"Quercus alba"
  • "host":"gbif.org/species/2879737"
  • "parasitoid of":"Cyclocephala signaticollis" | "predator of":"Apis mellifera"
+ + + + + + + + + +
otherCatalogNumbers
Identifierhttp://rs.tdwg.org/dwc/terms/otherCatalogNumbers
DefinitionA list (concatenated and separated) of previous or alternate fully qualified catalog numbers or other human-used identifiers for the same dwc:Occurrence, whether in the current or any other data set or collection.
CommentsRecommended best practice is to separate the values in a list with space vertical bar space ( | ).
Examples
  • FMNH:Mammal:1234
  • NPS YELLO6778 | MBG 33424
+ + + + + + + + + +
occurrenceRemarks
Identifierhttp://rs.tdwg.org/dwc/terms/occurrenceRemarks
DefinitionComments or notes about the dwc:Occurrence.
Comments
Examplesfound dead on road
+ + +## Organism + + + + + + + + + + + +
Organism Class
Identifierhttp://rs.tdwg.org/dwc/terms/Organism
DefinitionA particular organism or defined group of organisms considered to be taxonomically homogeneous.
CommentsInstances of the dwc:Organism class are intended to facilitate linking one or more dwc:Identification instances to one or more dwc:Occurrence instances. Therefore, things that are typically assigned scientific names (such as viruses, hybrids, and lichens) and aggregates whose dwc:Occurrences are typically recorded (such as packs, clones, and colonies) are included in the scope of this class.
Examples
  • a specific bird
  • a specific wolf pack
  • a specific instance of a bacterial culture
+ + + + + + + + + + +
organismID
Identifierhttp://rs.tdwg.org/dwc/terms/organismID
DefinitionAn identifier for the dwc:Organism instance (as opposed to a particular digital record of the dwc:Organism). May be a globally unique identifier or an identifier specific to the data set.
Comments
Exampleshttp://arctos.database.museum/guid/WNMU:Mamm:1249
+ + + + + + + + + +
organismName
Identifierhttp://rs.tdwg.org/dwc/terms/organismName
DefinitionA textual name or label assigned to a dwc:Organism instance.
Comments
Examples
  • Huberta
  • Boab Prison Tree
  • J pod
+ + + + + + + + + +
organismScope
Identifierhttp://rs.tdwg.org/dwc/terms/organismScope
DefinitionA description of the kind of dwc:Organism instance. Can be used to indicate whether the dwc:Organism instance represents a discrete organism or if it represents a particular type of aggregation.
CommentsRecommended best practice is to use a controlled vocabulary. This term is not intended to be used to specify a type of dwc:Taxon. To describe the kind of dwc:Organism using a URI object in RDF, use rdf:type (http://www.w3.org/1999/02/22-rdf-syntax-ns#type) instead.
Examples
  • multicellular organism
  • virus
  • clone
  • pack
  • colony
+ + + + + + + + + +
causeOfDeath
Identifierhttp://rs.tdwg.org/dwc/terms/causeOfDeath
DefinitionAn indication of the known or suspected cause of death of a dwc:Organism.
CommentsThe cause may be due to natural causes (e.g., disease, predation), human-related activities (e.g., roadkill, pollution), or other environmental factors (e.g., extreme weather events).
Examples
  • trap
  • poison
  • starvation
  • drowning
  • shooting
  • old age
  • vehicle collision
  • disease
  • herbicide
  • burning
  • infanticide
+ + + + + + + + + +
associatedOrganisms
Identifierhttp://rs.tdwg.org/dwc/terms/associatedOrganisms
DefinitionA list (concatenated and separated) of identifiers of other dwc:Organisms and the associations of this dwc:Organism to each of them.
CommentsThis term can be used to provide a list of associations to other dwc:Organisms. Note that the dwc:ResourceRelationship class is an alternative means of representing associations, and with more detail. Recommended best practice is to separate the values in a list with space vertical bar space ( | ).
Examples
+ + + + + + + + + +
previousIdentifications
Identifierhttp://rs.tdwg.org/dwc/terms/previousIdentifications
DefinitionA list (concatenated and separated) of previous assignments of names to the dwc:Organism.
CommentsRecommended best practice is to separate the values in a list with space vertical bar space ( | ).
Examples
  • Chalepidae
  • Pinus abies
  • Anthus sp., field ID by G. Iglesias | Anthus correndera, expert ID by C. Cicero 2009-02-12 based on morphology
+ + + + + + + + + +
organismRemarks
Identifierhttp://rs.tdwg.org/dwc/terms/organismRemarks
DefinitionComments or notes about the dwc:Organism instance.
Comments
ExamplesOne of a litter of six
+ + +## MaterialEntity + + + + + + + + + + + +
MaterialEntity Class
Identifierhttp://rs.tdwg.org/dwc/terms/MaterialEntity
DefinitionAn entity that can be identified, exists for some period of time, and consists in whole or in part of physical matter while it exists.
CommentsThe term is defined at the most general level to admit descriptions of any subtype of material entity within the scope of Darwin Core. In particular, any kind of material sample, preserved specimen, fossil, or exemplar from living collections is intended to be subsumed under this term.
Examples
  • an instance of a fossil
  • an instance of a herbarium sheet with its attached plant specimen
  • a particular part of the plant-derived material affixed to a herbarium sheet
  • an instance of a frozen tissue sample
  • a specific water sample
  • an instance of a meteorite fragment
  • a particular wolf in a zoo
  • a particular pack of wolves in the wild
  • an isolated molecule of DNA
  • a specific deep-frozen DNA sample
  • a particular field notebook
  • a particular paper page from a field notebook
  • an instance of a printed photograph
+ + + + + + + + + + +
materialEntityID
Identifierhttp://rs.tdwg.org/dwc/terms/materialEntityID
DefinitionAn identifier for a particular instance of a dwc:MaterialEntity.
CommentsValues of dwc:materialEntityID are intended to uniquely and persistently identify a particular dwc:MaterialEntity within some context. Examples of context include a particular sample collection, an organization, or the worldwide scale. Recommended best practice is to use a persistent, globally unique identifier. The identifier is bound to a physical object (the dwc:MaterialEntity) as opposed to a particular digital record (representation) of that physical object.
Examples06809dc5-f143-459a-be1a-6f03e63fc083
+ + + + + + + + + +
digitalSpecimenID
Identifierhttp://rs.tdwg.org/dwc/terms/digitalSpecimenID
DefinitionAn identifier for a particular instance of a Digital Specimen.
CommentsA Digital Specimen is defined in https://doi.org/10.3897/rio.7.e67379. A dwc:digitalSpecimenID is intended to uniquely and persistently identify a Digital Specimen. Recommended best practice is to use a DOI with machine readable metadata in the DOI record that uses a community agreed metadata profile (also known as FDO profile) for a Digital Specimen. For an example see: https://doi.org/10.3535/N75-CR4-0SM?noredirect. The identifier is for a digital information artifact (the Digital Specimen) as opposed to an identifier for a specific instance of a dwc:MaterialEntity.
Examples
+ + + + + + + + + +
materialEntityType
Identifierhttp://rs.tdwg.org/dwc/terms/materialEntityType
DefinitionA category that best matches the nature of a dwc:MaterialEntity.
CommentsA more generic classification of a dwc:MaterialEntity than dwc:preparations. Recommended best practice is to use a controlled vocabulary. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • Macro-object
  • Micro-object
  • Oversized object
  • Cut/polished gemstone
  • Compound Specimen
  • Core
  • Mixed Materials
  • Environmental sample
  • Microscope slide
  • Spore print
  • Macrofossil
  • Mesofossil
  • Microfossil
  • Pinned object/specimen
  • Taxidermy mount
  • Blood sampling cards
  • Oversized fossil
  • Anthropogenic Artifact
+ + + + + + + + + +
discipline
Identifierhttp://rs.tdwg.org/dwc/terms/discipline
DefinitionThe primary branch or branches of knowledge represented by the record.
CommentsThis term can be used to classify records according to branches of knowledge. Recommended best practice is to use a controlled vocabulary and to separate the values in a list with space vertical bar space ( | ).This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value. It is also recommended to use this field to describe specimenType in MIDS.
Examples
  • Botany
  • Botany | Virology | Taxonomy
+ + + + + + + + + +
preparations
Identifierhttp://rs.tdwg.org/dwc/terms/preparations
DefinitionA list (concatenated and separated) of preparations and preservation methods for a dwc:MaterialEntity.
CommentsRecommended best practice is to separate the values in a list with space vertical bar space ( | ). This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • fossil
  • cast
  • photograph
  • DNA extract
  • skin | skull | skeleton
  • whole animal (EtOH) | tissue (EDTA)
+ + + + + + + + + +
disposition
Identifierhttp://rs.tdwg.org/dwc/terms/disposition
DefinitionThe current state of a dwc:MaterialEntity with respect to a collection.
CommentsRecommended best practice is to use a controlled vocabulary. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • in collection
  • missing
  • on loan
  • used up
  • destroyed
  • deaccessioned
+ + + + + + + + + +
verbatimLabel
Identifierhttp://rs.tdwg.org/dwc/terms/verbatimLabel
DefinitionThe content of this term should include no embellishments, prefixes, headers or other additions made to the text. Abbreviations must not be expanded and supposed misspellings must not be corrected. Lines or breakpoints between blocks of text that could be verified by seeing the original labels or images of them may be used. Examples of material entities include preserved specimens, fossil specimens, and material samples. Best practice is to use UTF-8 for all characters. Best practice is to add comment “verbatimLabel derived from human transcription” in dwc:occurrenceRemarks.
CommentsExamples can be found at https://dwc.tdwg.org/examples/verbatimLabel.
Examples
+ + + + + + + + + +
associatedSequences
Identifierhttp://rs.tdwg.org/dwc/terms/associatedSequences
DefinitionA list (concatenated and separated) of identifiers (publication, global unique identifier, URI) of genetic sequence information associated with the dwc:MaterialEntity.
Comments
Examples
+ + + + + + + + + +
materialEntityRemarks
Identifierhttp://rs.tdwg.org/dwc/terms/materialEntityRemarks
DefinitionComments or notes about the dwc:MaterialEntity instance.
Comments
Examples
  • found in association with charred remains
  • some original fragments missing
+ + +## MaterialSample + + + + + + + + + + + +
MaterialSample Class
Identifierhttp://rs.tdwg.org/dwc/terms/MaterialSample
DefinitionA material entity that represents an entity of interest in whole or in part.
Comments
Examples
  • a whole organism preserved in a collection
  • a part of an organism isolated for some purpose
  • a soil sample
  • a marine microbial sample
+ + + + + + + + + + +
materialSampleID
Identifierhttp://rs.tdwg.org/dwc/terms/materialSampleID
DefinitionAn identifier for the dwc:MaterialSample (as opposed to a particular digital record of the dwc:MaterialSample). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the dwc:materialSampleID globally unique.
CommentsRecommended best practice is to use a persistent, globally unique identifier.
Examples06809dc5-f143-459a-be1a-6f03e63fc083
+ + +## Event + + + + + + + + + + + +
Event Class
Identifierhttp://rs.tdwg.org/dwc/terms/Event
DefinitionAn action that occurs at some location during some time.
Comments
Examples
  • a specimen collecting event
  • a camera trap image capture
  • a marine trawl
+ + + + + + + + + + +
eventID
Identifierhttp://rs.tdwg.org/dwc/terms/eventID
DefinitionAn identifier for the set of information associated with a dwc:Event (something that occurs at a place and time). May be a global unique identifier or an identifier specific to the data set.
Comments
ExamplesINBO:VIS:Ev:00009375
+ + + + + + + + + +
parentEventID
Identifierhttp://rs.tdwg.org/dwc/terms/parentEventID
DefinitionAn identifier for the broader dwc:Event that groups this and potentially other dwc:Events.
CommentsUse a globally unique identifier for a dwc:Event or an identifier for a dwc:Event that is specific to the data set.
ExamplesA1 (parentEventID to identify the main Whittaker Plot in nested samples, each with its own eventID - A1:1, A1:2).
+ + + + + + + + + +
eventType
Identifierhttp://rs.tdwg.org/dwc/terms/eventType
DefinitionThe nature of the dwc:Event.
CommentsRecommended best practice is to use a controlled vocabulary. Regardless of the dwc:eventType, the interval of the dwc:Event can be captured in dwc:eventDate. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • Sample
  • Observation
  • Site Visit
  • Biotic Interaction
  • Bioblitz
  • Expedition
  • Survey
  • Project
+ + + + + + + + + +
fieldNumber
Identifierhttp://rs.tdwg.org/dwc/terms/fieldNumber
DefinitionAn identifier given to the dwc:Event in the field. Often serves as a link between field notes and the dwc:Event.
CommentsThis term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
ExamplesRV Sol 87-03-08
+ + + + + + + + + +
projectTitle
Identifierhttp://rs.tdwg.org/dwc/terms/projectTitle
DefinitionA list (concatenated and separated) of titles or names for projects that contributed to a dwc:Event.
CommentsUse this term to provide the official name or title of a project as it is commonly known and cited. Avoid abbreviations unless they are widely understood. The recommended best practice is to separate the values in a list with space vertical bar space ( | ).
Examples
  • The Nansen Legacy
  • Scalidophora i Noreg
  • Arctic Deep
+ + + + + + + + + +
projectID
Identifierhttp://rs.tdwg.org/dwc/terms/projectID
DefinitionA list (concatenated and separated) of identifiers for projects that contributed to a dwc:Event.
CommentsA projectID may be shared in multiple distinct datasets. The nature of the association can be described in the metadata project description element. This term should be used to provide a globally unique identifier (GUID) for a project, if available. This could be a DOI, URI, or any other persistent identifier that ensures a project can be uniquely distinguished from others. The recommended best practice is to separate the values in a list with space vertical bar space ( | ).
Examples
+ + + + + + + + + +
fundingAttribution
Identifierhttp://rs.tdwg.org/ac/terms/fundingAttribution
DefinitionText description of organizations or individuals who funded the creation of the resource.
CommentsSpecify the full official name of the funding body. This should include the complete name without abbreviations, unless the abbreviation is an official and commonly recognized form (e.g., NSF for the National Science Foundation). The recommended best practice is to separate the values in a list with space vertical bar space ( | ).
Examples
  • Norges forskningsråd
  • Artsdatabanken
  • Ocean Census | Nippon Foundation
+ + + + + + + + + +
fundingAttributionID
Identifierhttp://rs.tdwg.org/dwc/terms/fundingAttributionID
DefinitionA list (concatenated and separated) of the globally unique identifiers for the funding organizations or agencies that supported the project.
CommentsProvide a unique identifier for the funding body, such as an identifier used in governmental or international databases. If no official identifier exists, use a persistent and unique identifier within your organization or dataset. The recommended best practice is to separate the values in a list with space vertical bar space ( | ).
Examples
+ + + + + + + + + +
eventDate
Identifierhttp://rs.tdwg.org/dwc/terms/eventDate
DefinitionThe date-time or interval during which a dwc:Event occurred. For occurrences, this is the date-time when the dwc:Event was recorded. Not suitable for a time in a geological context.
CommentsRecommended best practice is to use a date that conforms to ISO 8601-1:2019.
Examples
  • 1963-03-08T14:07-06:00 (8 Mar 1963 at or after 2:07pm and before 2:08pm in the time zone six hours earlier than UTC)
  • 2009-02-20T08:40Z (20 February 2009 at or after 8:40am and before 8:41 UTC)
  • 2018-08-29T15:19 (29 August 2018 at or after 3:19pm and before 3:20pm local time)
  • 1809-02-12 (within the day 12 February 1809)
  • 1906-06 (in the month of June 1906)
  • 1971 (in the year 1971)
  • 2007-03-01T13:00:00Z/2008-05-11T15:30:00Z (some time within the interval beginning 1 March 2007 at 1pm UTC and before 11 May 2008 at 3:30pm UTC)
  • 1900/1909 (some time within the interval between the beginning of the year 1900 and before the year 1909)
  • 2007-11-13/15 (some time in the interval between the beginning of 13 November 2007 and before 15 November 2007)
+ + + + + + + + + +
eventTime
Identifierhttp://rs.tdwg.org/dwc/terms/eventTime
DefinitionThe time or interval during which a dwc:Event occurred.
CommentsRecommended best practice is to use a time of day that conforms to ISO 8601-1:2019.
Examples
  • 14:07-06:00 (at or after 2:07pm and before 2:08pm in the time zone six hours earlier than UTC)
  • 08:40:21Z (at or after 8:40:21am and before 8:41:22am UTC)
  • 13:00:00Z/15:30:00Z (at or after 1pm and before 3:30pm UTC)
+ + + + + + + + + +
startDayOfYear
Identifierhttp://rs.tdwg.org/dwc/terms/startDayOfYear
DefinitionThe earliest integer day of the year on which the dwc:Event occurred (1 for January 1, 365 for December 31, except in a leap year, in which case it is 366).
Comments
Examples
  • 1 (1 January)
  • 366 (31 December)
  • 365 (30 December in a leap year, 31 December in a non-leap year)
+ + + + + + + + + +
endDayOfYear
Identifierhttp://rs.tdwg.org/dwc/terms/endDayOfYear
DefinitionThe latest integer day of the year on which the dwc:Event occurred (1 for January 1, 365 for December 31, except in a leap year, in which case it is 366).
Comments
Examples
  • 1 (1 January)
  • 32 (1 February)
  • 366 (31 December)
  • 365 (30 December in a leap year, 31 December in a non-leap year)
+ + + + + + + + + +
year
Identifierhttp://rs.tdwg.org/dwc/terms/year
DefinitionThe four-digit year in which the dwc:Event occurred, according to the Common Era Calendar.
Comments
Examples
  • 1160
  • 2008
+ + + + + + + + + +
month
Identifierhttp://rs.tdwg.org/dwc/terms/month
DefinitionThe integer month in which the dwc:Event occurred.
Comments
Examples
  • 1 (January)
  • 10 (October)
+ + + + + + + + + +
day
Identifierhttp://rs.tdwg.org/dwc/terms/day
DefinitionThe integer day of the month on which the dwc:Event occurred.
Comments
Examples
  • 9
  • 28
+ + + + + + + + + +
verbatimEventDate
Identifierhttp://rs.tdwg.org/dwc/terms/verbatimEventDate
DefinitionThe verbatim original representation of the date and time information for a dwc:Event.
Comments
Examples
  • spring 1910
  • Marzo 2002
  • 1999-03-XX
  • 17IV1934
+ + + + + + + + + +
habitat
Identifierhttp://rs.tdwg.org/dwc/terms/habitat
DefinitionA category or description of the habitat in which the dwc:Event occurred.
CommentsThis term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • oak savanna
  • pre-cordilleran steppe
+ + + + + + + + + +
samplingProtocol
Identifierhttp://rs.tdwg.org/dwc/terms/samplingProtocol
DefinitionThe names of, references to, or descriptions of the methods or protocols used during a dwc:Event.
CommentsRecommended best practice is describe a dwc:Event with no more than one sampling protocol. In the case of a summary Event with multiple protocols, in which a specific protocol can not be attributed to specific dwc:Occurrences, the recommended best practice is to separate the values in a list with space vertical bar space ( | ). This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
+ + + + + + + + + +
sampleSizeValue
Identifierhttp://rs.tdwg.org/dwc/terms/sampleSizeValue
DefinitionA numeric value for a measurement of the size (time duration, length, area, or volume) of a sample in a sampling dwc:Event.
CommentsA dwc:sampleSizeValue must have a corresponding dwc:sampleSizeUnit.
Examples5 (sampleSizeValue) with metre (sampleSizeUnit)
+ + + + + + + + + +
sampleSizeUnit
Identifierhttp://rs.tdwg.org/dwc/terms/sampleSizeUnit
DefinitionThe unit of measurement of the size (time duration, length, area, or volume) of a sample in a sampling dwc:Event.
CommentsA dwc:sampleSizeUnit must have a corresponding dwc:sampleSizeValue, e.g., 5 for dwc:sampleSizeValue with m for dwc:sampleSizeUnit. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • minute
  • hour
  • day
  • metre
  • square metre
  • cubic metre
+ + + + + + + + + +
samplingEffort
Identifierhttp://rs.tdwg.org/dwc/terms/samplingEffort
DefinitionThe amount of effort expended during a dwc:Event.
Comments
Examples
  • 40 trap-nights
  • 10 observer-hours
  • 10 km by foot
  • 30 km by car
+ + + + + + + + + +
fieldNotes
Identifierhttp://rs.tdwg.org/dwc/terms/fieldNotes
DefinitionOne of a) an indicator of the existence of, b) a reference to (publication, URI), or c) the text of notes taken in the field about the dwc:Event.
CommentsThis term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
ExamplesNotes available in the Grinnell-Miller Library.
+ + + + + + + + + +
eventRemarks
Identifierhttp://rs.tdwg.org/dwc/terms/eventRemarks
DefinitionComments or notes about the dwc:Event.
Comments
ExamplesAfter the recent rains the river is nearly at flood stage.
+ + +## Location + + + + + + + + + + + +
Location Class
Identifierhttp://purl.org/dc/terms/Location
DefinitionA spatial region or named place.
Comments
Examples
  • the municipality of San Carlos de Bariloche, Río Negro, Argentina
  • the place defined by a georeference
+ + + + + + + + + + +
locationID
Identifierhttp://rs.tdwg.org/dwc/terms/locationID
DefinitionAn identifier for the set of dcterms:Location information. May be a global unique identifier or an identifier specific to the data set.
Comments
Exampleshttps://opencontext.org/subjects/768A875F-E205-4D0B-DE55-BAB7598D0FD1
+ + + + + + + + + +
higherGeographyID
Identifierhttp://rs.tdwg.org/dwc/terms/higherGeographyID
DefinitionAn identifier for the geographic region within which the dcterms:Location occurred.
CommentsRecommended best practice is to use a persistent identifier from a controlled vocabulary such as the Getty Thesaurus of Geographic Names.
Exampleshttp://vocab.getty.edu/tgn/1002002 (Antártida e Islas del Atlántico Sur, Territorio Nacional de la Tierra del Fuego, Argentina).
+ + + + + + + + + +
higherGeography
Identifierhttp://rs.tdwg.org/dwc/terms/higherGeography
DefinitionA list (concatenated and separated) of geographic names less specific than the information captured in the dwc:locality term.
CommentsRecommended best practice is to separate the values in a list with space vertical bar space ( | ), with terms in order from least specific to most specific.
Examples
  • North Atlantic Ocean
  • South America | Argentina | Patagonia | Parque Nacional Nahuel Huapi | Neuquén | Los Lagos with accompanying values South America (continent) Argentina (country), Neuquén (first order division), and Los Lagos (second order division)
+ + + + + + + + + +
continent
Identifierhttp://rs.tdwg.org/dwc/terms/continent
DefinitionThe name of the continent in which the dcterms:Location occurs.
CommentsRecommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names. Recommended best practice is to leave this field blank if the dcterms:Location spans multiple entities at this administrative level or if the dcterms:Location might be in one or another of multiple possible entities at this level. Multiplicity and uncertainty of the geographic entity can be captured either in the term dwc:higherGeography or in the term dwc:locality, or both.
Examples
  • Africa
  • Antarctica
  • Asia
  • Europe
  • North America
  • Oceania
  • South America
+ + + + + + + + + +
waterBody
Identifierhttp://rs.tdwg.org/dwc/terms/waterBody
DefinitionThe name of the water body in which the dcterms:Location occurs.
CommentsRecommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names.
Examples
  • Indian Ocean
  • Baltic Sea
  • Hudson River
  • Lago Nahuel Huapi
+ + + + + + + + + +
islandGroup
Identifierhttp://rs.tdwg.org/dwc/terms/islandGroup
DefinitionThe name of the island group in which the dcterms:Location occurs.
CommentsRecommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names.
Examples
  • Alexander Archipelago
  • Archipiélago Diego Ramírez
  • Seychelles
+ + + + + + + + + +
island
Identifierhttp://rs.tdwg.org/dwc/terms/island
DefinitionThe name of the island on or near which the dcterms:Location occurs.
CommentsRecommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names.
Examples
  • Nosy Be
  • Bikini Atoll
  • Vancouver
  • Viti Levu
  • Zanzibar
+ + + + + + + + + +
country
Identifierhttp://rs.tdwg.org/dwc/terms/country
DefinitionThe name of the country or major administrative unit in which the dcterms:Location occurs.
CommentsRecommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names. Recommended best practice is to leave this field blank if the dcterms:Location spans multiple entities at this administrative level or if the dcterms:Location might be in one or another of multiple possible entities at this level. Multiplicity and uncertainty of the geographic entity can be captured either in the term dwc:higherGeography or in the term dwc:locality, or both.
Examples
  • Denmark
  • Colombia
  • España
+ + + + + + + + + +
countryCode
Identifierhttp://rs.tdwg.org/dwc/terms/countryCode
DefinitionThe standard code for the country in which the dcterms:Location occurs.
CommentsRecommended best practice is to use an ISO 3166-1-alpha-2 country code, or ZZ (for an unknown location or a location unassignable to a single country code), or XZ (for the high seas beyond national jurisdictions).
Examples
  • AR
  • SV
  • XZ
  • ZZ
+ + + + + + + + + +
stateProvince
Identifierhttp://rs.tdwg.org/dwc/terms/stateProvince
DefinitionThe name of the next smaller administrative region than country (state, province, canton, department, region, etc.) in which the dcterms:Location occurs.
CommentsRecommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names. Recommended best practice is to leave this field blank if the dcterms:Location spans multiple entities at this administrative level or if the dcterms:Location might be in one or another of multiple possible entities at this level. Multiplicity and uncertainty of the geographic entity can be captured either in the term dwc:higherGeography or in the term dwc:locality, or both.
Examples
  • Montana
  • Minas Gerais
  • Córdoba
+ + + + + + + + + +
county
Identifierhttp://rs.tdwg.org/dwc/terms/county
DefinitionThe full, unabbreviated name of the next smaller administrative region than stateProvince (county, shire, department, etc.) in which the dcterms:Location occurs.
CommentsRecommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names. Recommended best practice is to leave this field blank if the dcterms:Location spans multiple entities at this administrative level or if the dcterms:Location might be in one or another of multiple possible entities at this level. Multiplicity and uncertainty of the geographic entity can be captured either in the term dwc:higherGeography or in the term dwc:locality, or both.
Examples
  • Missoula
  • Los Lagos
  • Mataró
+ + + + + + + + + +
municipality
Identifierhttp://rs.tdwg.org/dwc/terms/municipality
DefinitionThe full, unabbreviated name of the next smaller administrative region than county (city, municipality, etc.) in which the dcterms:Location occurs. Do not use this term for a nearby named place that does not contain the actual dcterms:Location.
CommentsRecommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names. Recommended best practice is to leave this field blank if the dcterms:Location spans multiple entities at this administrative level or if the dcterms:Location might be in one or another of multiple possible entities at this level. Multiplicity and uncertainty of the geographic entity can be captured either in the term dwc:higherGeography or in the term dwc:locality, or both.
Examples
  • Holzminden
  • Araçatuba
  • Ga-Segonyana
+ + + + + + + + + +
locality
Identifierhttp://rs.tdwg.org/dwc/terms/locality
DefinitionThe specific description of the place.
CommentsLess specific geographic information can be provided in other geographic terms (dwc:higherGeography, dwc:continent, dwc:country, dwc:stateProvince, dwc:county, dwc:municipality, dwc:waterBody, dwc:island, dwc:islandGroup). This term may contain information modified from the original to correct perceived errors or standardize the description.
Examples
  • Bariloche, 25 km NNE via Ruta Nacional 40 (=Ruta 237)
  • Queets Rainforest, Olympic National Park
+ + + + + + + + + +
verbatimLocality
Identifierhttp://rs.tdwg.org/dwc/terms/verbatimLocality
DefinitionThe original textual description of the place.
Comments
Examples25 km NNE Bariloche por R. Nac. 237
+ + + + + + + + + +
minimumElevationInMeters
Identifierhttp://rs.tdwg.org/dwc/terms/minimumElevationInMeters
DefinitionThe lower limit of the range of elevation (altitude, usually above sea level), in meters.
Comments
Examples
  • -100
  • 802
+ + + + + + + + + +
maximumElevationInMeters
Identifierhttp://rs.tdwg.org/dwc/terms/maximumElevationInMeters
DefinitionThe upper limit of the range of elevation (altitude, usually above sea level), in meters.
Comments
Examples
  • -205
  • 1236
+ + + + + + + + + +
verbatimElevation
Identifierhttp://rs.tdwg.org/dwc/terms/verbatimElevation
DefinitionThe original description of the elevation (altitude, usually above sea level) of the Location.
Comments
Examples100-200 m
+ + + + + + + + + +
verticalDatum
Identifierhttp://rs.tdwg.org/dwc/terms/verticalDatum
DefinitionThe vertical datum used as the reference upon which the values in the elevation terms are based.
CommentsRecommended best practice is to use a controlled vocabulary. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • EGM84
  • EGM96
  • EGM2008
  • PGM2000A
  • PGM2004
  • PGM2006
  • PGM2007
  • EPSG:7030
  • not recorded
+ + + + + + + + + +
minimumDepthInMeters
Identifierhttp://rs.tdwg.org/dwc/terms/minimumDepthInMeters
DefinitionThe lesser depth of a range of depth below the local surface, in meters.
Comments
Examples
  • 0
  • 100
+ + + + + + + + + +
maximumDepthInMeters
Identifierhttp://rs.tdwg.org/dwc/terms/maximumDepthInMeters
DefinitionThe greater depth of a range of depth below the local surface, in meters.
Comments
Examples
  • 0
  • 200
+ + + + + + + + + +
verbatimDepth
Identifierhttp://rs.tdwg.org/dwc/terms/verbatimDepth
DefinitionThe original description of the depth below the local surface.
Comments
Examples100-200 m
+ + + + + + + + + +
minimumDistanceAboveSurfaceInMeters
Identifierhttp://rs.tdwg.org/dwc/terms/minimumDistanceAboveSurfaceInMeters
DefinitionThe lesser distance in a range of distance from a reference surface in the vertical direction, in meters. Use positive values for locations above the surface, negative values for locations below. If depth measures are given, the reference surface is the location given by the depth, otherwise the reference surface is the location given by the elevation.
Comments
Examples
  • -1.5 (below the surface)
  • 4.2 (above the surface)
  • For a 1.5 meter sediment core from the bottom of a lake (at depth 20m) at 300m elevation: verbatimElevation: 300m minimumElevationInMeters: 300, maximumElevationInMeters: 300, verbatimDepth: 20m, minimumDepthInMeters: 20, maximumDepthInMeters: 20, minimumDistanceAboveSurfaceInMeters: 0, maximumDistanceAboveSurfaceInMeters: -1.5.
+ + + + + + + + + +
maximumDistanceAboveSurfaceInMeters
Identifierhttp://rs.tdwg.org/dwc/terms/maximumDistanceAboveSurfaceInMeters
DefinitionThe greater distance in a range of distance from a reference surface in the vertical direction, in meters. Use positive values for locations above the surface, negative values for locations below. If depth measures are given, the reference surface is the location given by the depth, otherwise the reference surface is the location given by the elevation.
Comments
Examples
  • -1.5 (below the surface)
  • 4.2 (above the surface)
  • For a 1.5 meter sediment core from the bottom of a lake (at depth 20m) at 300m elevation: verbatimElevation: 300m minimumElevationInMeters: 300, maximumElevationInMeters: 300, verbatimDepth: 20m, minimumDepthInMeters: 20, maximumDepthInMeters: 20, minimumDistanceAboveSurfaceInMeters: 0, maximumDistanceAboveSurfaceInMeters: -1.5.
+ + + + + + + + + +
locationAccordingTo
Identifierhttp://rs.tdwg.org/dwc/terms/locationAccordingTo
DefinitionInformation about the source of this dcterms:Location information. Could be a publication (gazetteer), institution, or team of individuals.
CommentsThis term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • Getty Thesaurus of Geographic Names
  • GADM
+ + + + + + + + + +
locationRemarks
Identifierhttp://rs.tdwg.org/dwc/terms/locationRemarks
DefinitionComments or notes about the dcterms:Location.
Comments
Examplesunder water since 2005
+ + + + + + + + + +
decimalLatitude
Identifierhttp://rs.tdwg.org/dwc/terms/decimalLatitude
DefinitionThe geographic latitude (in decimal degrees, using the spatial reference system given in dwc:geodeticDatum) of the geographic center of a dcterms:Location. Positive values are north of the Equator, negative values are south of it. Legal values lie between -90 and 90, inclusive.
Comments
Examples-41.0983423
+ + + + + + + + + +
decimalLongitude
Identifierhttp://rs.tdwg.org/dwc/terms/decimalLongitude
DefinitionThe geographic longitude (in decimal degrees, using the spatial reference system given in dwc:geodeticDatum) of the geographic center of a dcterms:Location. Positive values are east of the Greenwich Meridian, negative values are west of it. Legal values lie between -180 and 180, inclusive.
Comments
Examples-121.1761111
+ + + + + + + + + +
geodeticDatum
Identifierhttp://rs.tdwg.org/dwc/terms/geodeticDatum
DefinitionThe ellipsoid, geodetic datum, or spatial reference system (SRS) upon which the geographic coordinates given in dwc:decimalLatitude and dwc:decimalLongitude are based.
CommentsRecommended best practice is to use the EPSG code of the SRS, if known. Otherwise use a controlled vocabulary for the name or code of the geodetic datum, if known. Otherwise use a controlled vocabulary for the name or code of the ellipsoid, if known. If none of these is known, use the value not recorded. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for a string literal value.
Examples
  • EPSG:4326
  • WGS84
  • NAD27
  • Campo Inchauspe
  • European 1950
  • Clarke 1866
  • not recorded
+ + + + + + + + + +
coordinateUncertaintyInMeters
Identifierhttp://rs.tdwg.org/dwc/terms/coordinateUncertaintyInMeters
DefinitionThe horizontal distance (in meters) from the given dwc:decimalLatitude and dwc:decimalLongitude describing the smallest circle containing the whole of the dcterms:Location. Leave the value empty if the uncertainty is unknown, cannot be estimated, or is not applicable (because there are no coordinates). Zero is not a valid value for this term.
Comments
Examples
  • 30 (reasonable lower limit on or after 2000-05-01 of a GPS reading under good conditions if the actual precision was not recorded at the time)
  • 100 (reasonable lower limit before 2000-05-01 of a GPS reading under good conditions if the actual precision was not recorded at the time)
  • 71 (uncertainty for a UTM coordinate having 100 meter precision and a known spatial reference system)
+ + + + + + + + + +
coordinatePrecision
Identifierhttp://rs.tdwg.org/dwc/terms/coordinatePrecision
DefinitionA decimal representation of the precision of the coordinates given in the dwc:decimalLatitude and dwc:decimalLongitude.
Comments
Examples
  • 0.00001 (normal GPS limit for decimal degrees)
  • 0.000278 (nearest second)
  • 0.01667 (nearest minute)
  • 1.0 (nearest degree)
+ + + + + + + + + +
pointRadiusSpatialFit
Identifierhttp://rs.tdwg.org/dwc/terms/pointRadiusSpatialFit
DefinitionThe ratio of the area of the point-radius (dwc:decimalLatitude, dwc:decimalLongitude, dwc:coordinateUncertaintyInMeters) to the area of the true (original, or most specific) spatial representation of the dcterms:Location. Legal values are 0, greater than or equal to 1, or undefined. A value of 1 is an exact match or 100% overlap. A value of 0 should be used if the given point-radius does not completely contain the original representation. The dwc:pointRadiusSpatialFit is undefined (and should be left empty) if the original representation is any geometry without area (e.g., a point or polyline) and without uncertainty and the given georeference is not that same geometry (without uncertainty). If both the original and the given georeference are the same point, the dwc:pointRadiusSpatialFit is 1.
CommentsDetailed explanations with graphical examples can be found in the Georeferencing Best Practices, Chapman and Wieczorek, 2020 (https://doi.org/10.15468/doc-gg7h-s853).
Examples
  • 0
  • 1
  • 1.5708
+ + + + + + + + + +
verbatimCoordinates
Identifierhttp://rs.tdwg.org/dwc/terms/verbatimCoordinates
DefinitionThe verbatim original spatial coordinates of the dcterms:Location. The coordinate ellipsoid, geodeticDatum, or full Spatial Reference System (SRS) for these coordinates should be stored in dwc:verbatimSRS and the coordinate system should be stored in dwc:verbatimCoordinateSystem.
Comments
Examples
  • 41 05 54S 121 05 34W
  • 17T 630000 4833400
+ + + + + + + + + +
verbatimLatitude
Identifierhttp://rs.tdwg.org/dwc/terms/verbatimLatitude
DefinitionThe verbatim original latitude of the dcterms:Location. The coordinate ellipsoid, geodeticDatum, or full Spatial Reference System (SRS) for these coordinates should be stored in dwc:verbatimSRS and the coordinate system should be stored in dwc:verbatimCoordinateSystem.
Comments
Examples41 05 54.03S
+ + + + + + + + + +
verbatimLongitude
Identifierhttp://rs.tdwg.org/dwc/terms/verbatimLongitude
DefinitionThe verbatim original longitude of the dcterms:Location. The coordinate ellipsoid, geodeticDatum, or full Spatial Reference System (SRS) for these coordinates should be stored in dwc:verbatimSRS and the coordinate system should be stored in dwc:verbatimCoordinateSystem.
Comments
Examples121d 10' 34" W
+ + + + + + + + + +
verbatimCoordinateSystem
Identifierhttp://rs.tdwg.org/dwc/terms/verbatimCoordinateSystem
DefinitionThe coordinate format for the dwc:verbatimLatitude and dwc:verbatimLongitude or the dwc:verbatimCoordinates of the dcterms:Location.
CommentsRecommended best practice is to use a controlled vocabulary. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • decimal degrees
  • degrees decimal minutes
  • degrees minutes seconds
  • UTM
+ + + + + + + + + +
verbatimSRS
Identifierhttp://rs.tdwg.org/dwc/terms/verbatimSRS
DefinitionThe ellipsoid, geodetic datum, or spatial reference system (SRS) upon which coordinates given in dwc:verbatimLatitude and dwc:verbatimLongitude, or dwc:verbatimCoordinates are based.
CommentsRecommended best practice is to use the EPSG code of the SRS, if known. Otherwise use a controlled vocabulary for the name or code of the geodetic datum, if known. Otherwise use a controlled vocabulary for the name or code of the ellipsoid, if known. If none of these is known, use the value not recorded. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • EPSG:4326
  • WGS84
  • NAD27
  • Campo Inchauspe
  • European 1950
  • Clarke 1866
  • not recorded
+ + + + + + + + + +
footprintWKT
Identifierhttp://rs.tdwg.org/dwc/terms/footprintWKT
DefinitionA Well-Known Text (WKT) representation of the shape (footprint, geometry) that defines the dcterms:Location. A dcterms:Location may have both a point-radius representation (see dwc:decimalLatitude) and a footprint representation, and they may differ from each other.
CommentsThis term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
ExamplesPOLYGON ((10 20, 11 20, 11 21, 10 21, 10 20)) (the one-degree bounding box with opposite corners at longitude=10, latitude=20 and longitude=11, latitude=21)
+ + + + + + + + + +
footprintSRS
Identifierhttp://rs.tdwg.org/dwc/terms/footprintSRS
DefinitionThe ellipsoid, geodetic datum, or spatial reference system (SRS) upon which the geometry given in dwc:footprintWKT is based.
CommentsRecommended best practice is to use the EPSG code of the SRS, if known. Otherwise use a controlled vocabulary for the name or code of the geodetic datum, if known. Otherwise use a controlled vocabulary for the name or code of the ellipsoid, if known. If none of these is known, use the value not recorded. It is also permitted to provide the SRS in Well-Known-Text, especially if no EPSG code provides the necessary values for the attributes of the SRS. Do not use this term to describe the SRS of the dwc:decimalLatitude and dwc:decimalLongitude, nor of any verbatim coordinates - use the dwc:geodeticDatum and dwc:verbatimSRS instead. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • EPSG:4326
  • GEOGCS["GCS_WGS_1984", DATUM["D_WGS_1984", SPHEROID["WGS_1984",6378137,298.257223563]], PRIMEM["Greenwich",0], UNIT["Degree",0.0174532925199433]] (WKT for the standard WGS84 Spatial Reference System EPSG:4326)
  • not recorded
+ + + + + + + + + +
footprintSpatialFit
Identifierhttp://rs.tdwg.org/dwc/terms/footprintSpatialFit
DefinitionThe ratio of the area of the dwc:footprintWKT to the area of the true (original, or most specific) spatial representation of the dcterms:Location. Legal values are 0, greater than or equal to 1, or undefined. A value of 1 is an exact match or 100% overlap. A value of 0 should be used if the given dwc:footprintWKT does not completely contain the original representation. The dwc:footprintSpatialFit is undefined (and should be left empty) if the original representation is any geometry without area (e.g., a point or polyline) and without uncertainty and the given georeference is not that same geometry (without uncertainty). If both the original and the given georeference are the same point, the dwc:footprintSpatialFit is 1.
CommentsDetailed explanations with graphical examples can be found in the Georeferencing Best Practices, Chapman and Wieczorek, 2020 (https://doi.org/10.15468/doc-gg7h-s853).
Examples
  • 0
  • 1
  • 1.5708
+ + + + + + + + + +
georeferencedBy
Identifierhttp://rs.tdwg.org/dwc/terms/georeferencedBy
DefinitionA list (concatenated and separated) of names of people, groups, or organizations who determined the georeference (spatial representation) for the dcterms:Location.
CommentsRecommended best practice is to separate the values in a list with space vertical bar space ( | ). This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • Brad Millen (ROM)
  • Kristina Yamamoto | Janet Fang
+ + + + + + + + + +
georeferencedDate
Identifierhttp://rs.tdwg.org/dwc/terms/georeferencedDate
DefinitionThe date on which the dcterms:Location was georeferenced.
CommentsRecommended best practice is to use a date that conforms to ISO 8601-1:2019.
Examples
  • 1963-03-08T14:07-06:00 (8 Mar 1963 at or after 2:07pm and before 2:08pm in the time zone six hours earlier than UTC)
  • 2009-02-20T08:40Z (20 February 2009 at or after 8:40am and before 8:41 UTC)
  • 2018-08-29T15:19 (29 August 2018 at or after 3:19pm and before 3:20pm local time)
  • 1809-02-12 (within the day 12 February 1809)
  • 1906-06 (in the month of June 1906)
  • 1971 (in the year 1971)
  • 2007-03-01T13:00:00Z/2008-05-11T15:30:00Z (some time within the interval beginning 1 March 2007 at 1pm UTC and before 11 May 2008 at 3:30pm UTC)
  • 1900/1909 (some time within the interval between the beginning of the year 1900 and before the year 1909)
  • 2007-11-13/15 (some time in the interval between the beginning of 13 November 2007 and before 15 November 2007)
+ + + + + + + + + +
georeferenceProtocol
Identifierhttp://rs.tdwg.org/dwc/terms/georeferenceProtocol
DefinitionA description or reference to the methods used to determine the spatial footprint, coordinates, and uncertainties.
CommentsThis term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
ExamplesGeoreferencing Quick Reference Guide (Zermoglio et al. 2020, https://doi.org/10.35035/e09p-h128)
+ + + + + + + + + +
georeferenceSources
Identifierhttp://rs.tdwg.org/dwc/terms/georeferenceSources
DefinitionA list (concatenated and separated) of maps, gazetteers, or other resources used to georeference the dcterms:Location, described specifically enough to allow anyone in the future to use the same resources.
CommentsRecommended best practice is to separate the values in a list with space vertical bar space ( | ). This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
+ + + + + + + + + +
georeferenceRemarks
Identifierhttp://rs.tdwg.org/dwc/terms/georeferenceRemarks
DefinitionComments or notes about the spatial description determination, explaining assumptions made in addition or opposition to the those formalized in the method referred to in dwc:georeferenceProtocol.
Comments
ExamplesAssumed distance by road (Hwy. 101)
+ + +## GeologicalContext + + + + + + + + + + + +
GeologicalContext Class
Identifierhttp://rs.tdwg.org/dwc/terms/GeologicalContext
DefinitionGeological information, such as stratigraphy, that qualifies a region or place.
Comments
Examplesa lithostratigraphic layer
+ + + + + + + + + + +
geologicalContextID
Identifierhttp://rs.tdwg.org/dwc/terms/geologicalContextID
DefinitionAn identifier for the set of information associated with a dwc:GeologicalContext (the location within a geological context, such as stratigraphy). May be a global unique identifier or an identifier specific to the data set.
Comments
Exampleshttps://opencontext.org/subjects/e54377f7-4452-4315-b676-40679b10c4d9
+ + + + + + + + + +
earliestEonOrLowestEonothem
Identifierhttp://rs.tdwg.org/dwc/terms/earliestEonOrLowestEonothem
DefinitionThe full name of the earliest possible geochronologic eon or lowest chrono-stratigraphic eonothem or the informal name ("Precambrian") attributable to the stratigraphic horizon from which the dwc:MaterialEntity was collected.
Comments
Examples
  • Phanerozoic
  • Proterozoic
+ + + + + + + + + +
latestEonOrHighestEonothem
Identifierhttp://rs.tdwg.org/dwc/terms/latestEonOrHighestEonothem
DefinitionThe full name of the latest possible geochronologic eon or highest chrono-stratigraphic eonothem or the informal name ("Precambrian") attributable to the stratigraphic horizon from which the dwc:MaterialEntity was collected.
Comments
Examples
  • Phanerozoic
  • Proterozoic
+ + + + + + + + + +
earliestEraOrLowestErathem
Identifierhttp://rs.tdwg.org/dwc/terms/earliestEraOrLowestErathem
DefinitionThe full name of the earliest possible geochronologic era or lowest chronostratigraphic erathem attributable to the stratigraphic horizon from which the dwc:MaterialEntity was collected.
Comments
Examples
  • Cenozoic
  • Mesozoic
+ + + + + + + + + +
latestEraOrHighestErathem
Identifierhttp://rs.tdwg.org/dwc/terms/latestEraOrHighestErathem
DefinitionThe full name of the latest possible geochronologic era or highest chronostratigraphic erathem attributable to the stratigraphic horizon from which the dwc:MaterialEntity was collected.
Comments
Examples
  • Cenozoic
  • Mesozoic
+ + + + + + + + + +
earliestPeriodOrLowestSystem
Identifierhttp://rs.tdwg.org/dwc/terms/earliestPeriodOrLowestSystem
DefinitionThe full name of the earliest possible geochronologic period or lowest chronostratigraphic system attributable to the stratigraphic horizon from which the dwc:MaterialEntity was collected.
Comments
Examples
  • Neogene
  • Tertiary
  • Quaternary
+ + + + + + + + + +
latestPeriodOrHighestSystem
Identifierhttp://rs.tdwg.org/dwc/terms/latestPeriodOrHighestSystem
DefinitionThe full name of the latest possible geochronologic period or highest chronostratigraphic system attributable to the stratigraphic horizon from which the dwc:MaterialEntity was collected.
Comments
Examples
  • Neogene
  • Tertiary
  • Quaternary
+ + + + + + + + + +
earliestEpochOrLowestSeries
Identifierhttp://rs.tdwg.org/dwc/terms/earliestEpochOrLowestSeries
DefinitionThe full name of the earliest possible geochronologic epoch or lowest chronostratigraphic series attributable to the stratigraphic horizon from which the dwc:MaterialEntity was collected.
Comments
Examples
  • Holocene
  • Pleistocene
  • Ibexian Series
+ + + + + + + + + +
latestEpochOrHighestSeries
Identifierhttp://rs.tdwg.org/dwc/terms/latestEpochOrHighestSeries
DefinitionThe full name of the latest possible geochronologic epoch or highest chronostratigraphic series attributable to the stratigraphic horizon from which the dwc:MaterialEntity was collected.
Comments
Examples
  • Holocene
  • Pleistocene
  • Ibexian Series
+ + + + + + + + + +
earliestAgeOrLowestStage
Identifierhttp://rs.tdwg.org/dwc/terms/earliestAgeOrLowestStage
DefinitionThe full name of the earliest possible geochronologic age or lowest chronostratigraphic stage attributable to the stratigraphic horizon from which the dwc:MaterialEntity was collected.
Comments
Examples
  • Atlantic
  • Boreal
  • Skullrockian
+ + + + + + + + + +
latestAgeOrHighestStage
Identifierhttp://rs.tdwg.org/dwc/terms/latestAgeOrHighestStage
DefinitionThe full name of the latest possible geochronologic age or highest chronostratigraphic stage attributable to the stratigraphic horizon from which the dwc:MaterialEntity was collected.
Comments
Examples
  • Atlantic
  • Boreal
  • Skullrockian
+ + + + + + + + + +
lowestBiostratigraphicZone
Identifierhttp://rs.tdwg.org/dwc/terms/lowestBiostratigraphicZone
DefinitionThe full name of the lowest possible geological biostratigraphic zone of the stratigraphic horizon from which the dwc:MaterialEntity was collected.
Comments
ExamplesMaastrichtian
+ + + + + + + + + +
highestBiostratigraphicZone
Identifierhttp://rs.tdwg.org/dwc/terms/highestBiostratigraphicZone
DefinitionThe full name of the highest possible geological biostratigraphic zone of the stratigraphic horizon from which the dwc:MaterialEntity was collected.
Comments
ExamplesBlancan
+ + + + + + + + + +
lithostratigraphicTerms
Identifierhttp://rs.tdwg.org/dwc/terms/lithostratigraphicTerms
DefinitionThe combination of all lithostratigraphic names for the rock from which the dwc:MaterialEntity was collected.
Comments
ExamplesPleistocene-Weichselien
+ + + + + + + + + +
group
Identifierhttp://rs.tdwg.org/dwc/terms/group
DefinitionThe full name of the lithostratigraphic group from which the dwc:MaterialEntity was collected.
Comments
Examples
  • Bathurst
  • Lower Wealden
+ + + + + + + + + +
formation
Identifierhttp://rs.tdwg.org/dwc/terms/formation
DefinitionThe full name of the lithostratigraphic formation from which the dwc:MaterialEntity was collected.
Comments
Examples
  • Notch Peak Formation
  • House Limestone
  • Fillmore Formation
+ + + + + + + + + +
member
Identifierhttp://rs.tdwg.org/dwc/terms/member
DefinitionThe full name of the lithostratigraphic member from which the dwc:MaterialEntity was collected.
Comments
Examples
  • Lava Dam Member
  • Hellnmaria Member
+ + + + + + + + + +
bed
Identifierhttp://rs.tdwg.org/dwc/terms/bed
DefinitionThe full name of the lithostratigraphic bed from which the dwc:MaterialEntity was collected.
Comments
ExamplesHarlem coal
+ + +## Identification + + + + + + + + + + + +
Identification Class
Identifierhttp://rs.tdwg.org/dwc/terms/Identification
DefinitionA taxonomic determination (e.g., the assignment to a dwc:Taxon).
Comments
Examplesa subspecies determination of an organism
+ + + + + + + + + + +
identificationID
Identifierhttp://rs.tdwg.org/dwc/terms/identificationID
DefinitionAn identifier for the dwc:Identification (the body of information associated with the assignment of a scientific name). May be a global unique identifier or an identifier specific to the data set.
Comments
Examples9992
+ + + + + + + + + +
verbatimIdentification
Identifierhttp://rs.tdwg.org/dwc/terms/verbatimIdentification
DefinitionA string representing the taxonomic identification as it appeared in the original record.
CommentsThis term is meant to allow the capture of an unaltered original identification/determination, including identification qualifiers, hybrid formulas, uncertainties, etc. This term is meant to be used in addition to dwc:scientificName (and dwc:identificationQualifier etc.), not instead of it.
Examples
  • Peromyscus sp.
  • Ministrymon sp. nov. 1
  • Anser anser × Branta canadensis
  • Pachyporidae?
  • Potentilla × pantotricha Soják
  • Aconitum pilipes × A. variegatum
  • Lepomis auritus x cyanellus
+ + + + + + + + + +
identificationQualifier
Identifierhttp://rs.tdwg.org/dwc/terms/identificationQualifier
DefinitionA brief phrase or a standard term ("cf.", "aff.") to express the determiner's doubts about the dwc:Identification.
CommentsThis term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • aff. agrifolia var. oxyadenia (for Quercus aff. agrifolia var. oxyadenia with accompanying values Quercus in genus, agrifolia in specificEpithet, oxyadenia in infraspecificEpithet, and var. in taxonRank)
  • cf. var. oxyadenia (for Quercus agrifolia cf. var. oxyadenia with accompanying values Quercus in genus, agrifolia in specificEpithet, oxyadenia in infraspecificEpithet, and var. in taxonRank)
+ + + + + + + + + +
typeStatus
Identifierhttp://rs.tdwg.org/dwc/terms/typeStatus
DefinitionA list (concatenated and separated) of nomenclatural types (type status, typified scientific name, publication) applied to the subject.
CommentsRecommended best practice is to separate the values in a list with space vertical bar space ( | ). This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • holotype of Ctenomys sociabilis. Pearson O. P., and M. I. Christie. 1985. Historia Natural, 5(37):388
  • holotype of Pinus abies | holotype of Picea abies
+ + + + + + + + + +
typifiedName
Identifierhttp://rs.tdwg.org/dwc/terms/typifiedName
DefinitionA scientific name that is based on a type specimen.
CommentsRecommended best practice is also to indicate the dwc:typeStatus of the specimen.
ExamplesPolysiphonia amphibolis Womersley
+ + + + + + + + + +
identifiedBy
Identifierhttp://rs.tdwg.org/dwc/terms/identifiedBy
DefinitionA list (concatenated and separated) of names of people, groups, or organizations who assigned the dwc:Taxon to the subject.
CommentsWhen used in the context of an Event (such as in the Humboldt Extension), the subject consists of all of the dwc:Organisms related to the Event. Recommended best practice is to separate the values in a list with space vertical bar space ( | ). This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
ExamplesJames L. Patton| Theodore Pappenfuss | Robert Macey
+ + + + + + + + + +
identifiedByID
Identifierhttp://rs.tdwg.org/dwc/terms/identifiedByID
DefinitionA list (concatenated and separated) of the globally unique identifier for the person, people, groups, or organizations responsible for assigning the dwc:Taxon to the subject.
CommentsRecommended best practice is to provide a single identifier that disambiguates the details of the identifying agent. If a list is used, the order of the identifiers on the list should not be assumed to convey any semantics. Recommended best practice is to separate the values in a list with space vertical bar space ( | ).
Examples
+ + + + + + + + + +
dateIdentified
Identifierhttp://rs.tdwg.org/dwc/terms/dateIdentified
DefinitionThe date on which the subject was determined as representing the dwc:Taxon.
CommentsRecommended best practice is to use a date that conforms to ISO 8601-1:2019.
Examples
  • 1963-03-08T14:07-06:00 (8 Mar 1963 at or after 2:07pm and before 2:08pm in the time zone six hours earlier than UTC)
  • 2009-02-20T08:40Z (20 February 2009 at or after 8:40am and before 8:41 UTC)
  • 2018-08-29T15:19 (29 August 2018 at or after 3:19pm and before 3:20pm local time)
  • 1809-02-12 (within the day 12 February 1809)
  • 1906-06 (in the month of June 1906)
  • 1971 (in the year 1971)
  • 2007-03-01T13:00:00Z/2008-05-11T15:30:00Z (some time within the interval beginning 1 March 2007 at 1pm UTC and before 11 May 2008 at 3:30pm UTC)
  • 1900/1909 (some time within the interval between the beginning of the year 1900 and before the year 1909)
  • 2007-11-13/15 (some time in the interval between the beginning of 13 November 2007 and before 15 November 2007)
+ + + + + + + + + +
identificationReferences
Identifierhttp://rs.tdwg.org/dwc/terms/identificationReferences
DefinitionA list (concatenated and separated) of references (publication, global unique identifier, URI) used in the dwc:Identification.
CommentsWhen used in the context of an Event (such as in the Humboldt Extension), the subject consists of all of the dwc:Organisms related to the Event. Recommended best practice is to separate the values in a list with space vertical bar space ( | ).
Examples
  • Aves del Noroeste Patagonico. Christie et al. 2004.
  • Stebbins, R. Field Guide to Western Reptiles and Amphibians. 3rd Edition. 2003. | Irschick, D.J. and Shaffer, H.B. (1997). The polytypic species revisited: Morphological differentiation among tiger salamanders (Ambystoma tigrinum) (Amphibia: Caudata). Herpetologica, 53(1), 30-49.
+ + + + + + + + + +
identificationVerificationStatus
Identifierhttp://rs.tdwg.org/dwc/terms/identificationVerificationStatus
DefinitionA categorical indicator of the extent to which the taxonomic identification has been verified to be correct.
CommentsRecommended best practice is to use a controlled vocabulary such as that used in HISPID and ABCD. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples0 ("unverified" in HISPID/ABCD).
+ + + + + + + + + +
identificationRemarks
Identifierhttp://rs.tdwg.org/dwc/terms/identificationRemarks
DefinitionComments or notes about the dwc:Identification.
Comments
ExamplesDistinguished between Anthus correndera and Anthus hellmayri based on the comparative lengths of the uñas.
+ + +## Taxon + + + + + + + + + + + +
Taxon Class
Identifierhttp://rs.tdwg.org/dwc/terms/Taxon
DefinitionA group of organisms (sensu http://purl.obolibrary.org/obo/OBI_0100026) considered by taxonomists to form a homogeneous unit.
Comments
Examplesthe genus Truncorotaloides as published by Brönnimann et al. in 1953 in the Journal of Paleontology Vol. 27(6) p. 817-820
+ + + + + + + + + + +
taxonID
Identifierhttp://rs.tdwg.org/dwc/terms/taxonID
DefinitionAn identifier for the set of dwc:Taxon information. May be a global unique identifier or an identifier specific to the data set.
Comments
Examples
+ + + + + + + + + +
scientificNameID
Identifierhttp://rs.tdwg.org/dwc/terms/scientificNameID
DefinitionAn identifier for the nomenclatural (not taxonomic) details of a scientific name.
Comments
Examplesurn:lsid:ipni.org:names:37829-1:1.3
+ + + + + + + + + +
acceptedNameUsageID
Identifierhttp://rs.tdwg.org/dwc/terms/acceptedNameUsageID
DefinitionAn identifier for the name usage (documented meaning of the name according to a source) of the currently valid (zoological) or accepted (botanical) taxon.
CommentsThis term should be used for synonyms or misapplied names to refer to the dwc:taxonID of a dwc:Taxon record that represents the accepted (botanical) or valid (zoological) name. For Darwin Core Archives the related record should be present locally in the same archive.
Examples
  • tsn:41107 (ITIS)
  • urn:lsid:ipni.org:names:320035-2 (IPNI)
  • 2704179 (GBIF)
  • 6W3C4 (COL)
+ + + + + + + + + +
parentNameUsageID
Identifierhttp://rs.tdwg.org/dwc/terms/parentNameUsageID
DefinitionAn identifier for the name usage (documented meaning of the name according to a source) of the direct, most proximate higher-rank parent taxon (in a classification) of the most specific element of the dwc:scientificName.
CommentsThis term should be used for accepted names to refer to the dwc:taxonID of a dwc:Taxon record that represents the next higher taxon rank in the same taxonomic classification. For Darwin Core Archives the related record should be present locally in the same archive.
Examples
  • tsn:41074 (ITIS)
  • urn:lsid:ipni.org:names:30001404-2 (IPNI)
  • 2704173 (GBIF)
  • 6T8N (COL)
+ + + + + + + + + +
originalNameUsageID
Identifierhttp://rs.tdwg.org/dwc/terms/originalNameUsageID
DefinitionAn identifier for the name usage (documented meaning of the name according to a source) in which the terminal element of the dwc:scientificName was originally established under the rules of the associated dwc:nomenclaturalCode.
CommentsThis term should be used to refer to the dwc:taxonID of a dwc:Taxon record that represents the usage of the terminal element of the dwc:scientificName as originally established under the rules of the associated dwc:nomenclaturalCode. For example, for names governed by the ICNafp, this term would establish the relationship between a record representing a subsequent combination and the record for its corresponding basionym. Unlike basionyms, however, this term can apply to scientific names at all ranks. For Darwin Core Archives the related record should be present locally in the same archive.
Examples
  • tsn:41107 (ITIS)
  • urn:lsid:ipni.org:names:320035-2 (IPNI)
  • 2704179 (GBIF)
  • 6W3C4 (COL)
+ + + + + + + + + +
nameAccordingToID
Identifierhttp://rs.tdwg.org/dwc/terms/nameAccordingToID
DefinitionAn identifier for the source in which the specific taxon concept circumscription is defined or implied. See dwc:nameAccordingTo.
Comments
Exampleshttps://doi.org/10.1016/S0269-915X(97)80026-2
+ + + + + + + + + +
namePublishedInID
Identifierhttp://rs.tdwg.org/dwc/terms/namePublishedInID
DefinitionAn identifier for the publication in which the dwc:scientificName was originally established under the rules of the associated dwc:nomenclaturalCode.
CommentsA citation of the first publication of the name in its given combination, not the basionym / original name. Recombinations are often not published in zoology, in which case dwc:namePublishedInID should be empty.
Examples
+ + + + + + + + + +
taxonConceptID
Identifierhttp://rs.tdwg.org/dwc/terms/taxonConceptID
DefinitionAn identifier for the taxonomic concept to which the record refers - not for the nomenclatural details of a dwc:Taxon.
Comments
Examples8fa58e08-08de-4ac1-b69c-1235340b7001
+ + + + + + + + + +
scientificName
Identifierhttp://rs.tdwg.org/dwc/terms/scientificName
DefinitionThe full scientific name, with authorship and date information if known. When forming part of a dwc:Identification, this should be the name in lowest level taxonomic rank that can be determined. This term should not contain identification qualifications, which should instead be supplied in the dwc:identificationQualifier term.
CommentsThis term should not contain identification qualifications, which should instead be supplied in the IdentificationQualifier term. When applied to an Organism or Occurrence, this term should be used to represent the scientific name that was applied to the associated Organism in accordance with the Taxon to which it was or is currently identified. Names should be compliant to the most recent nomenclatural code. For example, names of hybrids for algae, fungi and plants should follow the rules of the International Code of Nomenclature for algae, fungi, and plants (Schenzhen Code Articles H.1, H.2 and H.3). Thus, use the multiplication sign × (Unicode U+00D7, HTML ×) to identify a hybrid, not x or X, if possible.
Examples
  • Coleoptera (order)
  • Vespertilionidae (family)
  • Manis (genus)
  • Ctenomys sociabilis (genus + specificEpithet)
  • Ambystoma tigrinum diaboli (genus + specificEpithet + infraspecificEpithet)
  • Roptrocerus typographi (Györfi, 1952) (genus + specificEpithet + scientificNameAuthorship)
  • Quercus agrifolia var. oxyadenia (Torr.) J.T. Howell (genus + specificEpithet + taxonRank + infraspecificEpithet + scientificNameAuthorship)
  • ×Agropogon littoralis (Sm.) C. E. Hubb.
  • Mentha ×smithiana R. A. Graham
  • Agrostis stolonifera L. × Polypogon monspeliensis (L.) Desf.
+ + + + + + + + + +
acceptedNameUsage
Identifierhttp://rs.tdwg.org/dwc/terms/acceptedNameUsage
DefinitionThe full name, with authorship and date information if known, of the currently valid (zoological) or accepted (botanical) dwc:Taxon.
CommentsThe full scientific name, with authorship and date information if known, of the accepted (botanical) or valid (zoological) name in cases where the provided dwc:scientificName is considered by the reference indicated in the dwc:nameAccordingTo property, or of the content provider, to be a synonym or misapplied name. When applied to a dwc:Organism or dwc:Occurrence, this term should be used in cases where a content provider regards the provided dwc:scientificName to be inconsistent with the taxonomic perspective of the content provider. For example, there are many discrepancies within specimen collections and observation datasets between the recorded name (e.g., the most recent identification from an expert who examined a specimen, or a field identification for an observed dwc:Organism), and the name asserted by the content provider to be taxonomically accepted.
ExamplesTamias minimus (valid name for Eutamias minimus)
+ + + + + + + + + +
parentNameUsage
Identifierhttp://rs.tdwg.org/dwc/terms/parentNameUsage
DefinitionThe full name, with authorship and date information if known, of the direct, most proximate higher-rank parent dwc:Taxon (in a classification) of the most specific element of the dwc:scientificName.
Comments
Examples
  • Rubiaceae
  • Gruiformes
  • Testudinae
+ + + + + + + + + +
originalNameUsage
Identifierhttp://rs.tdwg.org/dwc/terms/originalNameUsage
DefinitionThe taxon name, with authorship and date information if known, as it originally appeared when first established under the rules of the associated dwc:nomenclaturalCode. The basionym (botany) or basonym (bacteriology) of the dwc:scientificName or the senior/earlier homonym for replaced names.
CommentsThe full scientific name, with authorship and date information if known, of the name usage in which the terminal element of the dwc:scientificName was originally established under the rules of the associated dwc:nomenclaturalCode. For example, for names governed by the ICNafp, this term would indicate the basionym of a record representing a subsequent combination. Unlike basionyms, however, this term can apply to scientific names at all ranks.
Examples
  • Pinus abies
  • Gasterosteus saltatrix Linnaeus 1768
+ + + + + + + + + +
nameAccordingTo
Identifierhttp://rs.tdwg.org/dwc/terms/nameAccordingTo
DefinitionThe reference to the source in which the specific taxon concept circumscription is defined or implied - traditionally signified by the Latin "sensu" or "sec." (from secundum, meaning "according to"). For taxa that result from identifications, a reference to the keys, monographs, experts and other sources should be given.
CommentsThis term provides context to the dwc:scientificName. Together with the dwc:scientificName, separated by sensu or sec., it forms the taxon concept label, which may be seen as having the same relationship to dwc:taxonConceptID as, for example, dwc:acceptedNameUsage has to dwc:acceptedNameUsageID. When not provided, in Taxon Core data sets the dwc:nameAccordingTo can be taken to be the data set. In this case the data set mostly provides sufficient context to infer the delimitation of the taxon and its relationship with other taxa. In Occurrence Core data sets, when not provided, dwc:nameAccordingTo can be an underlying taxonomy of the data set, e.g. Plants of the World Online (http://powo.science.kew.org/) for vascular plant records in iNaturalist (in which case it should be provided), or, which is the case for most dwc:PreservedSpecimen data sets, the dwc:Identification, in which case there is no further context.
ExamplesFranz NM, Cardona-Duque J (2013) Description of two new species and phylogenetic reassessment of Perelleschus Wibmer & O’Brien, 1986 (Coleoptera: Curculionidae), with a complete taxonomic concept history of Perelleschus sec. Franz & Cardona-Duque, 2013. Syst Biodivers. 11: 209–236. (as the full citation of the Franz & Cardona-Duque (2013) in Perelleschus splendida sec. Franz & Cardona-Duque (2013))
+ + + + + + + + + +
namePublishedIn
Identifierhttp://rs.tdwg.org/dwc/terms/namePublishedIn
DefinitionA reference for the publication in which the dwc:scientificName was originally established under the rules of the associated dwc:nomenclaturalCode.
CommentsA citation of the first publication of the name in its given combination, not the basionym / original name. Recombinations are often not published in zoology, in which case dwc:namePublishedIn should be empty.
Examples
  • Pearson O. P., and M. I. Christie. 1985. Historia Natural, 5(37):388
  • Forel, Auguste, Diagnosies provisoires de quelques espèces nouvelles de fourmis de Madagascar, récoltées par M. Grandidier., Annales de la Societe Entomologique de Belgique, Comptes-rendus des Seances 30, 1886
+ + + + + + + + + +
namePublishedInYear
Identifierhttp://rs.tdwg.org/dwc/terms/namePublishedInYear
DefinitionThe four-digit year in which the dwc:scientificName was published.
Comments
Examples
  • 1915
  • 2008
+ + + + + + + + + +
higherClassification
Identifierhttp://rs.tdwg.org/dwc/terms/higherClassification
DefinitionA list (concatenated and separated) of taxa names terminating at the rank immediately superior to the referenced dwc:Taxon.
CommentsRecommended best practice is to separate the values in a list with space vertical bar space ( | ), with terms in order from the highest taxonomic rank to the lowest.
Examples
  • Plantae | Tracheophyta | Magnoliopsida | Ranunculales | Ranunculaceae | Ranunculus
  • Animalia
  • Animalia | Chordata | Vertebrata | Mammalia | Theria | Eutheria | Rodentia | Hystricognatha | Hystricognathi | Ctenomyidae | Ctenomyini | Ctenomys
+ + + + + + + + + +
kingdom
Identifierhttp://rs.tdwg.org/dwc/terms/kingdom
DefinitionThe full scientific name of the kingdom in which the dwc:Taxon is classified.
Comments
Examples
  • Animalia
  • Archaea
  • Bacteria
  • Chromista
  • Fungi
  • Plantae
  • Protozoa
  • Viruses
+ + + + + + + + + +
phylum
Identifierhttp://rs.tdwg.org/dwc/terms/phylum
DefinitionThe full scientific name of the phylum or division in which the dwc:Taxon is classified.
Comments
Examples
  • Chordata (phylum)
  • Bryophyta (division)
+ + + + + + + + + +
class
Identifierhttp://rs.tdwg.org/dwc/terms/class
DefinitionThe full scientific name of the class in which the dwc:Taxon is classified.
Comments
Examples
  • Mammalia
  • Hepaticopsida
+ + + + + + + + + +
order
Identifierhttp://rs.tdwg.org/dwc/terms/order
DefinitionThe full scientific name of the order in which the dwc:Taxon is classified.
Comments
Examples
  • Carnivora
  • Monocleales
+ + + + + + + + + +
superfamily
Identifierhttp://rs.tdwg.org/dwc/terms/superfamily
DefinitionThe full scientific name of the superfamily in which the dwc:Taxon is classified.
CommentsA taxonomic category subordinate to an order and superior to a family. According to ICZN article 29.2, the suffix -oidea is used for a superfamily name.
Examples
  • Achatinoidea
  • Cerithioidea
  • Helicoidea
  • Hypsibioidea
  • Valvatoidea
  • Zonitoidea
+ + + + + + + + + +
family
Identifierhttp://rs.tdwg.org/dwc/terms/family
DefinitionThe full scientific name of the family in which the dwc:Taxon is classified.
Comments
Examples
  • Felidae
  • Monocleaceae
+ + + + + + + + + +
subfamily
Identifierhttp://rs.tdwg.org/dwc/terms/subfamily
DefinitionThe full scientific name of the subfamily in which the dwc:Taxon is classified.
Comments
Examples
  • Periptyctinae
  • Orchidoideae
  • Sphindociinae
+ + + + + + + + + +
tribe
Identifierhttp://rs.tdwg.org/dwc/terms/tribe
DefinitionThe full scientific name of the tribe in which the dwc:Taxon is classified.
Comments
Examples
  • Ortaliini
  • Arethuseae
+ + + + + + + + + +
subtribe
Identifierhttp://rs.tdwg.org/dwc/terms/subtribe
DefinitionThe full scientific name of the subtribe in which the dwc:Taxon is classified.
Comments
Examples
  • Plotinini
  • Typhaeini
+ + + + + + + + + +
genus
Identifierhttp://rs.tdwg.org/dwc/terms/genus
DefinitionThe full scientific name of the genus in which the dwc:Taxon is classified.
Comments
Examples
  • Puma
  • Monoclea
+ + + + + + + + + +
genericName
Identifierhttp://rs.tdwg.org/dwc/terms/genericName
DefinitionThe genus part of the dwc:scientificName without authorship.
CommentsFor synonyms the accepted genus and the genus part of the name may be different. The term dwc:genericName should be used together with dwc:specificEpithet to form a binomial and with dwc:infraspecificEpithet to form a trinomial. The term dwc:genericName should only be used for combinations. Uninomials of generic rank do not have a dwc:genericName.
ExamplesFelis (for scientificName Felis concolor, with accompanying values of Puma concolor in acceptedNameUsage and Puma in genus)
+ + + + + + + + + +
subgenus
Identifierhttp://rs.tdwg.org/dwc/terms/subgenus
DefinitionThe full scientific name of the subgenus in which the dwc:Taxon is classified.
CommentsA value for this term should be a complete subgenus name as required by the appropriate nomenclatural code.
Examples
  • Abacetus (Parastygis)
  • Dicranum subgen. Orthodicranum
+ + + + + + + + + +
infragenericEpithet
Identifierhttp://rs.tdwg.org/dwc/terms/infragenericEpithet
DefinitionThe infrageneric part of a binomial name at ranks above species but below genus.
CommentsThe term dwc:infragenericEpithet should be used in conjunction with dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:taxonRank and dwc:scientificNameAuthorship to represent the individual elements of the complete dwc:scientificName. It can be used to indicate the subgenus placement of a species, which in zoology is often given in parentheses. Can also be used to share infrageneric names such as botanical sections (e.g., Vicia sect. Cracca).
Examples
  • Abacetillus (for scientificName Abacetus (Abacetillus) ambiguus)
  • Cracca (for scientificName Vicia sect. Cracca)
+ + + + + + + + + +
specificEpithet
Identifierhttp://rs.tdwg.org/dwc/terms/specificEpithet
DefinitionThe name of the first or species epithet of the dwc:scientificName.
Comments
Examples
  • concolor
  • gottschei
+ + + + + + + + + +
infraspecificEpithet
Identifierhttp://rs.tdwg.org/dwc/terms/infraspecificEpithet
DefinitionThe name of the lowest or terminal infraspecific epithet of the dwc:scientificName, excluding any rank designation.
CommentsIn botany, name strings in literature and identifications may have multiple infraspecific ranks. According to the International Code of Nomenclature for algae, fungi, and plants (Schenzhen Code Articles 6.7 & Art. 24.1), valid names only have two epithets, with the lowest rank being the dwc:infraspecificEpithet. For example: the dwc:infraspecificEpithet in the string Indigofera charlieriana subsp. sessilis var. scaberrima is scaberrima and the dwc:scientificName is Indigofera charlieriana var. scaberrima (Schinz) J.B.Gillett. Use dwc:verbatimIdentification for the full name string used in a dwc:Identification.
Examples
  • concolor (for scientificName Puma concolor concolor (Linnaeus, 1771))
  • oxyadenia (for scientificName Quercus agrifolia var. oxyadenia (Torr.) J.T. Howell)
  • laxa (for scientificName Cheilanthes hirta f. laxa (Kunze) W.Jacobsen & N.Jacobsen)
  • scaberrima (for scientificName Indigofera charlieriana var. scaberrima (Schinz) J.B.Gillett)
+ + + + + + + + + +
cultivarEpithet
Identifierhttp://rs.tdwg.org/dwc/terms/cultivarEpithet
DefinitionPart of the name of a cultivar, cultivar group or grex that follows the dwc:scientificName.
CommentsAccording to the Rules of the Cultivated Plant Code, a cultivar name consists of a botanical name followed by a cultivar epithet. The value given as the dwc:cultivarEpithet should exclude any quotes. The term dwc:taxonRank should be used to indicate which type of cultivated plant name (e.g. cultivar, cultivar group, grex) is concerned. This epithet, including any enclosing apostrophes or suffix, should be provided in dwc:scientificName as well.
Examples
  • King Edward (for scientificName Solanum tuberosum 'King Edward' and taxonRank cultivar)
  • Mishmiense (for scientificName Rhododendron boothii Mishmiense Group and taxonRank cultivar group)
  • Atlantis (for scientificName Paphiopedilum Atlantis grex and taxonRank grex)
+ + + + + + + + + +
taxonRank
Identifierhttp://rs.tdwg.org/dwc/terms/taxonRank
DefinitionThe taxonomic rank of the most specific name in the dwc:scientificName.
CommentsRecommended best practice is to use a controlled vocabulary. The taxon ranks of algae, fungi and plants are defined in the International Code of Nomenclature for algae, fungi, and plants (Schenzhen Code Articles H3.2, H4.4 and H.3.1).
Examples
  • subspecies
  • varietas
  • forma
  • species
  • genus
  • nothogenus
  • nothospecies
  • nothosubspecies
+ + + + + + + + + +
verbatimTaxonRank
Identifierhttp://rs.tdwg.org/dwc/terms/verbatimTaxonRank
DefinitionThe taxonomic rank of the most specific name in the dwc:scientificName as it appears in the original record.
Comments
Examples
  • Agamospecies
  • sub-lesus
  • prole
  • apomict
  • nothogrex
  • sp.
  • subsp.
  • var.
+ + + + + + + + + +
scientificNameAuthorship
Identifierhttp://rs.tdwg.org/dwc/terms/scientificNameAuthorship
DefinitionThe authorship information for the dwc:scientificName formatted according to the conventions of the applicable dwc:nomenclaturalCode.
Comments
Examples
  • (Torr.) J.T. Howell
  • (Martinovský) Tzvelev
  • (Györfi, 1952)
+ + + + + + + + + +
vernacularName
Identifierhttp://rs.tdwg.org/dwc/terms/vernacularName
DefinitionA common or vernacular name.
Comments
Examples
  • Andean Condor
  • Condor Andino
  • American Eagle
  • Gänsegeier
+ + + + + + + + + +
nomenclaturalCode
Identifierhttp://rs.tdwg.org/dwc/terms/nomenclaturalCode
DefinitionThe nomenclatural code (or codes in the case of an ambiregnal name) under which the dwc:scientificName is constructed.
CommentsRecommended best practice is to use a controlled vocabulary.
Examples
  • ICN
  • ICZN
  • BC
  • ICNCP
  • BioCode
+ + + + + + + + + +
taxonomicStatus
Identifierhttp://rs.tdwg.org/dwc/terms/taxonomicStatus
DefinitionThe status of the use of the dwc:scientificName as a label for a taxon. Requires taxonomic opinion to define the scope of a dwc:Taxon. Rules of priority then are used to define the taxonomic status of the nomenclature contained in that scope, combined with the experts opinion. It must be linked to a specific taxonomic reference that defines the concept.
CommentsRecommended best practice is to use a controlled vocabulary.
Examples
  • invalid
  • misapplied
  • homotypic synonym
  • accepted
+ + + + + + + + + +
nomenclaturalStatus
Identifierhttp://rs.tdwg.org/dwc/terms/nomenclaturalStatus
DefinitionThe status related to the original publication of the name and its conformance to the relevant rules of nomenclature. It is based essentially on an algorithm according to the business rules of the code. It requires no taxonomic opinion.
Comments
Examples
  • nom. ambig.
  • nom. illeg.
  • nom. subnud.
+ + + + + + + + + +
taxonRemarks
Identifierhttp://rs.tdwg.org/dwc/terms/taxonRemarks
DefinitionComments or notes about the taxon or name.
Comments
Examplesthis name is a misspelling in common use
+ + +## MeasurementOrFact + + + + + + + + + + + +
MeasurementOrFact Class
Identifierhttp://rs.tdwg.org/dwc/terms/MeasurementOrFact
DefinitionA measurement of or fact about an rdfs:Resource (http://www.w3.org/2000/01/rdf-schema#Resource).
CommentsResources can be thought of as identifiable records or instances of classes and may include, but need not be limited to instances of dwc:Occurrence, dwc:Organism, dwc:MaterialEntity, dwc:Event, dcterms:Location, dwc:GeologicalContext, dwc:Identification, or dwc:Taxon.
Examples
  • the weight of a dwc:Organism in grams
  • the number of placental scars
  • surface water temperature in Celsius
+ + + + + + + + + + +
measurementID
Identifierhttp://rs.tdwg.org/dwc/terms/measurementID
DefinitionAn identifier for the dwc:MeasurementOrFact (information pertaining to measurements, facts, characteristics, or assertions). May be a global unique identifier or an identifier specific to the data set.
Comments
Examples9c752d22-b09a-11e8-96f8-529269fb1459
+ + + + + + + + + +
parentMeasurementID
Identifierhttp://rs.tdwg.org/dwc/terms/parentMeasurementID
DefinitionAn identifier for a broader dwc:MeasurementOrFact that groups this and potentially other dwc:MeasurementOrFacts.
CommentsMay be a globally unique identifier or an identifier specific to the data set.
Examples
  • 9c752d22-b09a-11e8-96f8-529269fb1459
  • E1_E1_O1_M1
+ + + + + + + + + +
measurementType
Identifierhttp://rs.tdwg.org/dwc/terms/measurementType
DefinitionThe nature of the measurement, fact, characteristic, or assertion.
CommentsRecommended best practice is to use a controlled vocabulary. This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • tail length
  • temperature
  • trap line length
  • survey area
  • trap type
+ + + + + + + + + +
verbatimMeasurementType
Identifierhttp://rs.tdwg.org/dwc/terms/verbatimMeasurementType
DefinitionA string representing the type of measurement or fact as it appeared in the original record.
CommentsThis term is meant to allow the capture of an unaltered original name for a measurement or fact type. This term is meant to be used in addition to dwc:measurementType, not instead of it.
Examples
  • water_temp
  • Fish biomass
  • sampling net mesh size
+ + + + + + + + + +
measurementValue
Identifierhttp://rs.tdwg.org/dwc/terms/measurementValue
DefinitionThe value of the measurement, fact, characteristic, or assertion.
CommentsThis term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • 45
  • 20
  • 1
  • 14.5
  • UV-light
+ + + + + + + + + +
measurementAccuracy
Identifierhttp://rs.tdwg.org/dwc/terms/measurementAccuracy
DefinitionThe description of the potential error associated with the dwc:measurementValue.
Comments
Examples
  • 0.01
  • normal distribution with variation of 2 m
+ + + + + + + + + +
measurementUnit
Identifierhttp://rs.tdwg.org/dwc/terms/measurementUnit
DefinitionThe units associated with the dwc:measurementValue.
CommentsRecommended best practice is to use the International System of Units (SI). This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • m
  • g
  • l
  • °C
  • mm
  • km²
  • %
  • hh:mm:ss
+ + + + + + + + + +
measurementDeterminedBy
Identifierhttp://rs.tdwg.org/dwc/terms/measurementDeterminedBy
DefinitionA list (concatenated and separated) of names of people, groups, or organizations who determined the value of the dwc:MeasurementOrFact.
CommentsRecommended best practice is to separate the values in a list with space vertical bar space ( | ). This term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • Rob Guralnick
  • Peter Desmet | Stijn Van Hoey
+ + + + + + + + + +
measurementDeterminedDate
Identifierhttp://rs.tdwg.org/dwc/terms/measurementDeterminedDate
DefinitionThe date on which the dwc:MeasurementOrFact was made.
CommentsRecommended best practice is to use a date that conforms to ISO 8601-1:2019.
Examples
  • 1963-03-08T14:07-06:00 (8 Mar 1963 at or after 2:07pm and before 2:08pm in the time zone six hours earlier than UTC)
  • 2009-02-20T08:40Z (20 February 2009 at or after 8:40am and before 8:41 UTC)
  • 2018-08-29T15:19 (29 August 2018 at or after 3:19pm and before 3:20pm local time)
  • 1809-02-12 (within the day 12 February 1809)
  • 1906-06 (in the month of June 1906)
  • 1971 (in the year 1971)
  • 2007-03-01T13:00:00Z/2008-05-11T15:30:00Z (some time within the interval beginning 1 March 2007 at 1pm UTC and before 11 May 2008 at 3:30pm UTC)
  • 1900/1909 (some time within the interval between the beginning of the year 1900 and before the year 1909)
  • 2007-11-13/15 (some time in the interval between the beginning of 13 November 2007 and before 15 November 2007)
+ + + + + + + + + +
measurementMethod
Identifierhttp://rs.tdwg.org/dwc/terms/measurementMethod
DefinitionA description of or reference to (publication, URI) the method or protocol used to determine the measurement, fact, characteristic, or assertion.
CommentsThis term has an equivalent in the dwciri: namespace that allows only an IRI as a value, whereas this term allows for any string literal value.
Examples
  • minimum convex polygon around burrow entrances (for a home range area)
  • barometric altimeter (for an elevation)
+ + + + + + + + + +
measurementRemarks
Identifierhttp://rs.tdwg.org/dwc/terms/measurementRemarks
DefinitionComments or notes accompanying the dwc:MeasurementOrFact.
Comments
Examplestip of tail missing
+ + +## ResourceRelationship + + + + + + + + + + + +
ResourceRelationship Class
Identifierhttp://rs.tdwg.org/dwc/terms/ResourceRelationship
DefinitionA relationship of one rdfs:Resource (http://www.w3.org/2000/01/rdf-schema#Resource) to another.
CommentsResources can be thought of as identifiable records or instances of classes and may include, but need not be limited to instances of dwc:Occurrence, dwc:Organism, dwc:MaterialEntity, dwc:Event, dcterms:Location, dwc:GeologicalContext, dwc:Identification, or dwc:Taxon.
Examples
  • an instance of a dwc:Organism is the mother of another instance of a dwc:Organism
  • a uniquely identified dwc:Occurrence represents the same dwc:Occurrence as another uniquely identified dwc:Occurrence
  • a dwc:MaterialEntity is a subsample of another dwc:MaterialEntity
+ + + + + + + + + + +
resourceRelationshipID
Identifierhttp://rs.tdwg.org/dwc/terms/resourceRelationshipID
DefinitionAn identifier for an instance of relationship between one resource (the subject) and another (dwc:relatedResource, the object).
Comments
Examples04b16710-b09c-11e8-96f8-529269fb1459
+ + + + + + + + + +
resourceID
Identifierhttp://rs.tdwg.org/dwc/terms/resourceID
DefinitionAn identifier for the resource that is the subject of the relationship.
Comments
Examplesf809b9e0-b09b-11e8-96f8-529269fb1459
+ + + + + + + + + +
relationshipOfResourceID
Identifierhttp://rs.tdwg.org/dwc/terms/relationshipOfResourceID
DefinitionAn identifier for the relationship type (predicate) that connects the subject identified by dwc:resourceID to its object identified by dwc:relatedResourceID.
CommentsRecommended best practice is to use the identifiers of the terms in a controlled vocabulary, such as the OBO Relation Ontology.
Examples
+ + + + + + + + + +
relatedResourceID
Identifierhttp://rs.tdwg.org/dwc/terms/relatedResourceID
DefinitionAn identifier for a related resource (the object, rather than the subject of the relationship).
Comments
Examplesdc609808-b09b-11e8-96f8-529269fb1459
+ + + + + + + + + +
relationshipOfResource
Identifierhttp://rs.tdwg.org/dwc/terms/relationshipOfResource
DefinitionThe relationship of the subject (identified by dwc:resourceID) to the object (identified by dwc:relatedResourceID).
CommentsRecommended best practice is to use a controlled vocabulary.
Examples
  • same as
  • duplicate of
  • mother of
  • offspring of
  • sibling of
  • parasite of
  • host of
  • valid synonym of
  • located within
  • pollinator of members of taxon
  • pollinated specific plant
  • pollinated by members of taxon
  • on slab with
+ + + + + + + + + +
relationshipAccordingTo
Identifierhttp://rs.tdwg.org/dwc/terms/relationshipAccordingTo
DefinitionThe source (person, organization, publication, reference) establishing the relationship between the two resources.
Comments
ExamplesJulie Woodruff
+ + + + + + + + + +
relationshipEstablishedDate
Identifierhttp://rs.tdwg.org/dwc/terms/relationshipEstablishedDate
DefinitionThe date-time on which the relationship between the two resources was established.
CommentsRecommended best practice is to use a date that conforms to ISO 8601-1:2019.
Examples
  • 1963-03-08T14:07-06:00 (8 Mar 1963 at or after 2:07pm and before 2:08pm in the time zone six hours earlier than UTC)
  • 2009-02-20T08:40Z (20 February 2009 at or after 8:40am and before 8:41 UTC)
  • 2018-08-29T15:19 (29 August 2018 at or after 3:19pm and before 3:20pm local time)
  • 1809-02-12 (within the day 12 February 1809)
  • 1906-06 (in the month of June 1906)
  • 1971 (in the year 1971)
  • 2007-03-01T13:00:00Z/2008-05-11T15:30:00Z (some time within the interval beginning 1 March 2007 at 1pm UTC and before 11 May 2008 at 3:30pm UTC)
  • 1900/1909 (some time within the interval between the beginning of the year 1900 and before the year 1909)
  • 2007-11-13/15 (some time in the interval between the beginning of 13 November 2007 and before 15 November 2007)
+ + + + + + + + + +
relationshipRemarks
Identifierhttp://rs.tdwg.org/dwc/terms/relationshipRemarks
DefinitionComments or notes about the relationship between the two resources.
Comments
Examples
  • mother and offspring collected from the same nest
  • pollinator captured in the act
+ + +## UseWithIRI + +For more information on `UseWithIRI`, see [Section 2.5 of the RDF Guide](https://dwc.tdwg.org/rdf/#25-terms-in-the-dwciri-namespace-normative). + + + + + + + + + + + + +
behavior
Identifierhttp://rs.tdwg.org/dwc/iri/behavior
DefinitionA description of the behavior shown by the subject at the time the dwc:Occurrence was recorded.
CommentsRecommended best practice is to use a controlled vocabulary. Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
caste
Identifierhttp://rs.tdwg.org/dwc/iri/caste
DefinitionCategorisation of individuals for eusocial species (including some mammals and arthropods).
CommentsRecommended best practice is to use a controlled vocabulary that aligns best with the dwc:Taxon. Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
dataGeneralizations
Identifierhttp://rs.tdwg.org/dwc/iri/dataGeneralizations
DefinitionActions taken to make the shared data less specific or complete than in its original form. Suggests that alternative data of higher quality may be available on request.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
degreeOfEstablishment
Identifierhttp://rs.tdwg.org/dwc/iri/degreeOfEstablishment
DefinitionThe degree to which a dwc:Organism survives, reproduces, and expands its range at the given place and time.
CommentsRecommended best practice is to use IRIs from the controlled vocabulary designated for use with this term, listed at http://rs.tdwg.org/dwc/doc/doe/. For details, refer to https://doi.org/10.3897/biss.3.38084 . Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
disposition
Identifierhttp://rs.tdwg.org/dwc/iri/disposition
DefinitionThe current state of a specimen with respect to the collection identified in dwc:collectionCode or dwc:collectionID.
CommentsRecommended best practice is to use a controlled vocabulary. Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
earliestGeochronologicalEra
Identifierhttp://rs.tdwg.org/dwc/iri/earliestGeochronologicalEra
DefinitionUse to link a dwc:GeologicalContext instance to chronostratigraphic time periods at the lowest possible level in a standardized hierarchy. Use this property to point to the earliest possible geological time period from which the dwc:MaterialEntity was collected.
CommentsRecommended best practice is to use an IRI from a controlled vocabulary. A "convenience property" that replaces Darwin Core literal-value terms related to geological context. See Section 2.7.6 of the Darwin Core RDF Guide for details.
Examples
+ + + + + + + + + +
establishmentMeans
Identifierhttp://rs.tdwg.org/dwc/iri/establishmentMeans
DefinitionStatement about whether a dwc:Organism has been introduced to a given place and time through the direct or indirect activity of modern humans.
CommentsRecommended best practice is to use IRIs from the controlled vocabulary designated for use with this term, listed at http://rs.tdwg.org/dwc/doc/em/. For details, refer to https://doi.org/10.3897/biss.3.38084 . Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
eventType
Identifierhttp://rs.tdwg.org/dwc/iri/eventType
DefinitionThe nature of the dwc:Event.
CommentsRecommended best practice is to use a controlled vocabulary. Regardless of the dwc:eventType, the interval of the dwc:Event can be captured in dwc:eventDate. Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
fieldNotes
Identifierhttp://rs.tdwg.org/dwc/iri/fieldNotes
DefinitionOne of a) an indicator of the existence of, b) a reference to (publication, URI), or c) the text of notes taken in the field about the dwc:Event.
CommentsThe subject is a dwc:Event instance and the object is a (possibly IRI-identified) resource that is the field notes.
Examples
+ + + + + + + + + +
fieldNumber
Identifierhttp://rs.tdwg.org/dwc/iri/fieldNumber
DefinitionAn identifier given to the event in the field. Often serves as a link between field notes and the dwc:Event.
CommentsThe subject is a (possibly IRI-identified) resource that is the field notes and the object is a dwc:Event instance.
Examples
+ + + + + + + + + +
footprintSRS
Identifierhttp://rs.tdwg.org/dwc/iri/footprintSRS
DefinitionThe ellipsoid, geodetic datum, or spatial reference system (SRS) upon which the geometry given in dwc:footprintWKT is based.
CommentsRecommended best practice is to use an IRI for the EPSG code of the SRS, if known. Otherwise use a controlled vocabulary IRI for the name or code of the geodetic datum, if known. Otherwise use a controlled vocabulary IRI for the name or code of the ellipsoid, if known. Otherwise use an IRI for the value corresponding to not recorded.
Examples
+ + + + + + + + + +
footprintWKT
Identifierhttp://rs.tdwg.org/dwc/iri/footprintWKT
DefinitionA Well-Known Text (WKT) representation of the shape (footprint, geometry) that defines the dcterms:Location. A dcterms:Location may have both a point-radius representation (see dwc:decimalLatitude) and a footprint representation, and they may differ from each other.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
fundingAttribution
Identifierhttp://rs.tdwg.org/dwc/iri/fundingAttribution
DefinitionAn organization or agency that provided funding for a project.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
fromLithostratigraphicUnit
Identifierhttp://rs.tdwg.org/dwc/iri/fromLithostratigraphicUnit
DefinitionUse to link a dwc:GeologicalContext instance to an IRI-identified lithostratigraphic unit at the lowest possible level in a hierarchy.
CommentsRecommended best practice is to use an IRI from a controlled vocabulary. A "convenience property" that replaces Darwin Core literal-value terms related to geological context. See Section 2.7.7 of the Darwin Core RDF Guide for details.
Examples
+ + + + + + + + + +
geodeticDatum
Identifierhttp://rs.tdwg.org/dwc/iri/geodeticDatum
DefinitionThe ellipsoid, geodetic datum, or spatial reference system (SRS) upon which the geographic coordinates given in dwc:decimalLatitude and dwc:decimalLongitude are based.
CommentsRecommended best practice is to use an IRI for the EPSG code of the SRS, if known. Otherwise use a controlled vocabulary for the name or code of the geodetic datum, if known. Otherwise use a controlled vocabulary for the name or code of the ellipsoid, if known. If none of these is known, use an IRI corresponding to the value not recorded.
Examples
+ + + + + + + + + +
georeferencedBy
Identifierhttp://rs.tdwg.org/dwc/iri/georeferencedBy
DefinitionA person, group, or organization who determined the georeference (spatial representation) for the dcterms:Location.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
georeferenceProtocol
Identifierhttp://rs.tdwg.org/dwc/iri/georeferenceProtocol
DefinitionA description or reference to the methods used to determine the spatial footprint, coordinates, and uncertainties.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
georeferenceSources
Identifierhttp://rs.tdwg.org/dwc/iri/georeferenceSources
DefinitionA map, gazetteer, or other resource used to georeference the dcterms:Location.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
georeferenceVerificationStatus
Identifierhttp://rs.tdwg.org/dwc/iri/georeferenceVerificationStatus
DefinitionA categorical description of the extent to which the georeference has been verified to represent the best possible spatial description for the dcterms:Location of the dwc:Occurrence.
CommentsRecommended best practice is to use a controlled vocabulary. Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
habitat
Identifierhttp://rs.tdwg.org/dwc/iri/habitat
DefinitionA category or description of the habitat in which the dwc:Event occurred.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
identificationQualifier
Identifierhttp://rs.tdwg.org/dwc/iri/identificationQualifier
DefinitionA controlled value to express the determiner's doubts about the dwc:Identification.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
identificationVerificationStatus
Identifierhttp://rs.tdwg.org/dwc/iri/identificationVerificationStatus
DefinitionA categorical indicator of the extent to which the taxonomic identification has been verified to be correct.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects. Recommended best practice is to use a controlled vocabulary such as that used in HISPID and ABCD.
Examples
+ + + + + + + + + +
identifiedBy
Identifierhttp://rs.tdwg.org/dwc/iri/identifiedBy
DefinitionA person, group, or organization who assigned the dwc:Taxon to the subject.
CommentsWhen used in the context of an Event (such as in the Humboldt Extension), the subject consists of all of the dwc:Organisms related to the Event. Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
inCollection
Identifierhttp://rs.tdwg.org/dwc/iri/inCollection
DefinitionUse to link any subject resource that is part of a collection to the collection containing the resource.
CommentsRecommended best practice is to use an IRI from a controlled registry. A "convenience property" that replaces literal-value terms related to collections and institutions. See Section 2.7.3 of the Darwin Core RDF Guide for details.
Examples
+ + + + + + + + + +
inDataset
Identifierhttp://rs.tdwg.org/dwc/iri/inDataset
DefinitionUse to link a subject dataset record to the dataset which contains it.
CommentsA string literal name of the dataset can be provided using the term dwc:datasetName. See the Darwin Core RDF Guide for details.
Examples
+ + + + + + + + + +
inDescribedPlace
Identifierhttp://rs.tdwg.org/dwc/iri/inDescribedPlace
DefinitionUse to link a dcterms:Location instance subject to the lowest level standardized hierarchically-described resource.
CommentsRecommended best practice is to use an IRI from a controlled registry. A "convenience property" that replaces Darwin Core literal-value terms related to locations. See Section 2.7.5 of the Darwin Core RDF Guide for details.
Exampleshttp://vocab.getty.edu/tgn/1019987
+ + + + + + + + + +
informationWithheld
Identifierhttp://rs.tdwg.org/dwc/iri/informationWithheld
DefinitionAdditional information that exists, but that has not been shared in the given record.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
language
Identifierhttp://purl.org/dc/terms/language
DefinitionA language of the resource.
CommentsRecommended best practice is to use an IRI from the Library of Congress ISO 639-2 scheme http://id.loc.gov/vocabulary/iso639-2
Examples
+ + + + + + + + + +
latestGeochronologicalEra
Identifierhttp://rs.tdwg.org/dwc/iri/latestGeochronologicalEra
DefinitionUse to link a dwc:GeologicalContext instance to chronostratigraphic time periods at the lowest possible level in a standardized hierarchy. Use this property to point to the latest possible geological time period from which the dwc:MaterialEntity was collected.
CommentsRecommended best practice is to use an IRI from a controlled vocabulary. A "convenience property" that replaces Darwin Core literal-value terms related to geological context. See Section 2.7.6 of the Darwin Core RDF Guide for details.
Examples
+ + + + + + + + + +
lifeStage
Identifierhttp://rs.tdwg.org/dwc/iri/lifeStage
DefinitionThe age class or life stage of the dwc:Organism(s) at the time the dwc:Occurrence was recorded.
CommentsRecommended best practice is to use a controlled vocabulary. Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
locationAccordingTo
Identifierhttp://rs.tdwg.org/dwc/iri/locationAccordingTo
DefinitionInformation about the source of this dcterms:Location information. Could be a publication (gazetteer), institution, or team of individuals.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
measurementDeterminedBy
Identifierhttp://rs.tdwg.org/dwc/iri/measurementDeterminedBy
DefinitionA person, group, or organization who determined the value of the dwc:MeasurementOrFact.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
measurementMethod
Identifierhttp://rs.tdwg.org/dwc/iri/measurementMethod
DefinitionThe method or protocol used to determine the measurement, fact, characteristic, or assertion.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
measurementType
Identifierhttp://rs.tdwg.org/dwc/iri/measurementType
DefinitionThe nature of the measurement, fact, characteristic, or assertion.
CommentsRecommended best practice is to use a controlled vocabulary. Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
measurementUnit
Identifierhttp://rs.tdwg.org/dwc/iri/measurementUnit
DefinitionThe units associated with the dwc:measurementValue.
CommentsRecommended best practice is to use a controlled vocabulary such as the Ontology of Units of Measure http://www.wurvoc.org/vocabularies/om-1.8/ of SI units, derived units, or other non-SI units accepted for use within the SI.
Examples
+ + + + + + + + + +
measurementValue
Identifierhttp://rs.tdwg.org/dwc/iri/measurementValue
DefinitionThe value of the measurement, fact, characteristic, or assertion.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Exampleshttp://vocab.nerc.ac.uk/collection/L22/current/TOOL0960/
+ + + + + + + + + +
occurrenceStatus
Identifierhttp://rs.tdwg.org/dwc/iri/occurrenceStatus
DefinitionA statement about the presence or absence of a dwc:Taxon at a dcterms:Location.
CommentsRecommended best practice is to use a controlled vocabulary. Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
organismQuantityType
Identifierhttp://rs.tdwg.org/dwc/iri/organismQuantityType
DefinitionThe type of quantification system used for the quantity of organisms.
CommentsA dwc:organismQuantityType must have a corresponding dwc:organismQuantity.
Examples
+ + + + + + + + + +
pathway
Identifierhttp://rs.tdwg.org/dwc/iri/pathway
DefinitionThe process by which a dwc:Organism came to be in a given place at a given time.
CommentsRecommended best practice is to use IRIs from the controlled vocabulary designated for use with this term, listed at http://rs.tdwg.org/dwc/doc/pw/. For details, refer to https://doi.org/10.3897/biss.3.38084 . Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
preparations
Identifierhttp://rs.tdwg.org/dwc/iri/preparations
DefinitionA preparation or preservation method for a specimen.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
recordedBy
Identifierhttp://rs.tdwg.org/dwc/iri/recordedBy
DefinitionA person, group, or organization responsible for recording the original dwc:Occurrence.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
recordNumber
Identifierhttp://rs.tdwg.org/dwc/iri/recordNumber
DefinitionAn identifier given to the dwc:Occurrence at the time it was recorded. Often serves as a link between field notes and a dwc:Occurrence record, such as a specimen collector's number.
CommentsThe subject is a dwc:Occurrence and the object is a (possibly IRI-identified) resource that is the field notes.
Examples
+ + + + + + + + + +
reproductiveCondition
Identifierhttp://rs.tdwg.org/dwc/iri/reproductiveCondition
DefinitionThe reproductive condition of the biological individual(s) represented in the dwc:Occurrence.
CommentsRecommended best practice is to use a controlled vocabulary. Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
sampleSizeUnit
Identifierhttp://rs.tdwg.org/dwc/iri/sampleSizeUnit
DefinitionThe unit of measurement of the size (time duration, length, area, or volume) of a sample in a sampling dwc:Event.
CommentsA dwciri:sampleSizeUnit must have a corresponding dwc:sampleSizeValue. Recommended best practice is to use a controlled vocabulary such as the Ontology of Units of Measure http://www.wurvoc.org/vocabularies/om-1.8/ of SI units, derived units, or other non-SI units accepted for use within the SI.
Examples
+ + + + + + + + + +
samplingProtocol
Identifierhttp://rs.tdwg.org/dwc/iri/samplingProtocol
DefinitionThe methods or protocols used during a dwc:Event, denoted by an IRI.
CommentsRecommended best practice is describe a dwc:Event with no more than one sampling protocol. In the case of a summary dwc:Event in which a specific protocol can not be attributed to specific dwc:Occurrences, the recommended best practice is to repeat the property for each IRI that denotes a different sampling protocol that applies to the dwc:Occurrence.
Exampleshttps://doi.org/10.1111/j.1466-8238.2009.00467.x
+ + + + + + + + + +
sex
Identifierhttp://rs.tdwg.org/dwc/iri/sex
DefinitionThe sex of the biological individual(s) represented in the dwc:Occurrence.
CommentsRecommended best practice is to use a controlled vocabulary. Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
toDigitalSpecimen
Identifierhttp://rs.tdwg.org/dwc/iri/toDigitalSpecimen
DefinitionUse to link a dwc:Identification instance subject to a taxonomic entity such as a taxon, taxon concept, or taxon name use.
CommentsUse to link a dwc:MaterialEntity instance subject to a Digital Specimem entity.
Examples
+ + + + + + + + + +
toTaxon
Identifierhttp://rs.tdwg.org/dwc/iri/toTaxon
DefinitionUse to link a dwc:Identification instance subject to a taxonomic entity such as a taxon, taxon concept, or taxon name use.
CommentsA "convenience property" that replaces Darwin Core literal-value terms related to taxonomic entities. See Section 2.7.4 of the Darwin Core RDF Guide for details.
Examples
+ + + + + + + + + +
typeStatus
Identifierhttp://rs.tdwg.org/dwc/iri/typeStatus
DefinitionA nomenclatural type (type status, typified scientific name, publication) applied to the subject.
CommentsTerms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
verbatimCoordinateSystem
Identifierhttp://rs.tdwg.org/dwc/iri/verbatimCoordinateSystem
DefinitionThe spatial coordinate system for the dwc:verbatimLatitude and dwc:verbatimLongitude or the dwc:verbatimCoordinates of the dcterms:Location.
CommentsRecommended best practice is to use a controlled vocabulary. Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
verbatimSRS
Identifierhttp://rs.tdwg.org/dwc/iri/verbatimSRS
DefinitionThe ellipsoid, geodetic datum, or spatial reference system (SRS) upon which coordinates given in dwc:verbatimLatitude and dwc:verbatimLongitude, or dwc:verbatimCoordinates are based.
CommentsRecommended best practice is to use an IRI for the EPSG code of the SRS, if known. Otherwise use a controlled vocabulary IRI for the name or code of the geodetic datum, if known. Otherwise use a controlled vocabulary IRI for the name or code of the ellipsoid, if known. Otherwise use an IRI for the value corresponding to not recorded.
Examples
+ + + + + + + + + +
verticalDatum
Identifierhttp://rs.tdwg.org/dwc/iri/verticalDatum
DefinitionThe vertical datum used as the reference upon which the values in the elevation terms are based.
CommentsRecommended best practice is to use a controlled vocabulary. Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + + + + + + + + +
vitality
Identifierhttp://rs.tdwg.org/dwc/iri/vitality
DefinitionAn indication of whether a dwc:Organism was alive or dead at the time of collection or observation.
CommentsRecommended best practice is to use a controlled vocabulary. Intended to be used with records having a dwc:basisOfRecord of PreservedSpecimen, MaterialEntity, MaterialSample, or HumanObservation. Terms in the dwciri: namespace are intended to be used in RDF with non-literal objects.
Examples
+ + +## LivingSpecimen + +
+
+ + + + + + + + + +
LivingSpecimen Class
Identifierhttp://rs.tdwg.org/dwc/terms/LivingSpecimen
DefinitionA specimen that is alive.
Comments
Examples
  • a living plant in a botanical garden
  • a living animal in a zoo
+ + + +## PreservedSpecimen + +
+
+ + + + + + + + + +
PreservedSpecimen Class
Identifierhttp://rs.tdwg.org/dwc/terms/PreservedSpecimen
DefinitionA specimen that has been preserved.
Comments
Examples
  • a plant on an herbarium sheet
  • a cataloged lot of fish in a jar
+ + + +## FossilSpecimen + +
+
+ + + + + + + + + +
FossilSpecimen Class
Identifierhttp://rs.tdwg.org/dwc/terms/FossilSpecimen
DefinitionA preserved specimen that is a fossil.
Comments
Examples
  • a body fossil
  • a coprolite
  • a gastrolith
  • an ichnofossil
  • a piece of a petrified tree
+ + + +## MaterialCitation + +
+
+ + + + + + + + + +
MaterialCitation Class
Identifierhttp://rs.tdwg.org/dwc/terms/MaterialCitation
DefinitionA reference to or citation of one, a part of, or multiple specimens in scholarly publications.
CommentsThis class constitutes a new value for the controlled vocabulary in the recommendations for basisOfRecord. When importing Darwin Core Archives of literature-based datasets to GBIF, the basisOfRecord should be changed from "Occurrence", "PreservedSpecimen" or "Literature" to "MaterialCitation".
Examples
  • a citation of a physical specimen from a scientific collection in a taxonomic treatment in a scientific publication
  • a citation of a group of physical specimens, such as paratypes in a taxonomic treatment in a scientific publication
+ + + +## HumanObservation + +
+
+ + + + + + + + + +
HumanObservation Class
Identifierhttp://rs.tdwg.org/dwc/terms/HumanObservation
DefinitionAn output of a human observation process.
Comments
Examples
  • evidence of a dwc:Occurrence taken from field notes or literature
  • a record of a dwc:Occurrence without physical evidence or evidence captured with a machine
+ + + +## MachineObservation + +
+
+ + + + + + + + + +
MachineObservation Class
Identifierhttp://rs.tdwg.org/dwc/terms/MachineObservation
DefinitionAn output of a machine observation process.
Comments
Examples
  • a photograph
  • a video
  • an audio recording
  • a remote sensing image
  • a dwc:Occurrence record based on telemetry
+ + + +## Cite Darwin Core + +To cite Darwin Core in general, use the peer-reviewed article on Darwin Core: + +> Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, et al. (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. + +To cite the standard document upon which this page is built, use the following: + +> Darwin Core Maintenance Group. 2021. List of Darwin Core terms. Biodiversity Information Standards (TDWG). + +To cite this document specifically, use the following: + +> Darwin Core Maintenance Group. 2021. Darwin Core Quick Reference Guide. Biodiversity Information Standards (TDWG). \ No newline at end of file diff --git a/docs/claude/dwca-format-reference.md b/docs/claude/dwca-format-reference.md new file mode 100644 index 000000000..dec394bce --- /dev/null +++ b/docs/claude/dwca-format-reference.md @@ -0,0 +1,179 @@ +# Darwin Core Archive (DwC-A) Format Reference + +## What is DwC-A? + +A ZIP archive containing standardized biodiversity data files. The standard format for sharing occurrence and sampling-event data with GBIF, OBIS, and other biodiversity data aggregators. + +## Archive Structure + +``` +archive.zip +├── meta.xml # Required: describes file structure and term mappings +├── eml.xml # Recommended: dataset metadata (Ecological Metadata Language) +├── event.txt # Core file (tab-separated) +├── occurrence.txt # Extension file (tab-separated) +└── (other extensions like multimedia.txt, measurementorfact.txt) +``` + +## Star Schema + +One **core** file, surrounded by **extension** files. Extensions link back to the core via an ID column. + +For sampling-event datasets (like AMI): +- **Core**: Event (one row per sampling event) +- **Extension**: Occurrence (many occurrences per event) + +## meta.xml Specification + +```xml + + + + + + event.txt + + + + + + + + + + occurrence.txt + + + + + + + + +``` + +### Key Attributes + +| Attribute | Default | Notes | +|-----------|---------|-------| +| `rowType` | Required | URI: `http://rs.tdwg.org/dwc/terms/Event`, `...Occurrence`, `...Taxon` | +| `fieldsTerminatedBy` | `,` | Use `\t` for TSV (recommended for DwC-A) | +| `linesTerminatedBy` | `\n` | Standard newline | +| `fieldsEnclosedBy` | `"` | Quote character | +| `encoding` | `UTF-8` | Always use UTF-8 | +| `ignoreHeaderLines` | `0` | Set to `1` if header row present | +| `dateFormat` | `YYYY-MM-DD` | ISO 8601 | + +### Field Element + +- `index` (0-based): column position in the data file +- `term`: Darwin Core term URI +- `default`: constant value for all rows (no index needed) + +### ID Elements + +- `` in core: column containing unique record ID +- `` in extensions: column containing the core record's ID (foreign key) + +## EML Metadata (eml.xml) + +Describes the dataset: title, abstract, creators, geographic/temporal coverage, methods, etc. GBIF provides an EML profile. Minimum useful content: + +```xml + + + + {project.name} + + {project.owner or institution} + + + {project.description} + + + License information here + + + +``` + +## Key DwC Terms for AMI Data + +### Event Terms (Core) + +| DwC Term | AMI Source | Notes | +|----------|-----------|-------| +| eventID | `urn:ami:event:{project_slug}:{event.id}` | Globally unique | +| parentEventID | | Empty for now (could link to deployment-level events) | +| eventType | `"CameraTrapSession"` | Or custom vocabulary | +| eventDate | `event.start` / `event.end` as ISO interval | `2024-06-15/2024-06-16` | +| year | from `event.start` | | +| month | from `event.start` | | +| day | from `event.start` | | +| samplingProtocol | `"automated light trap with camera"` | Project-level constant | +| sampleSizeValue | `event.captures_count` | Number of images | +| sampleSizeUnit | `"images"` | | +| samplingEffort | `event.duration` formatted | e.g. "12 hours" | +| eventRemarks | | | +| **Location terms (on event)** | | | +| locationID | `deployment.name` or `site.name` | | +| decimalLatitude | `deployment.latitude` | | +| decimalLongitude | `deployment.longitude` | | +| geodeticDatum | `"WGS84"` | Assumed | +| coordinateUncertaintyInMeters | | Not currently stored | + +### Occurrence Terms (Extension) + +| DwC Term | AMI Source | Notes | +|----------|-----------|-------| +| eventID | Same as core eventID | Links occurrence to event | +| occurrenceID | `urn:ami:occurrence:{project_slug}:{occurrence.id}` | Globally unique | +| basisOfRecord | `"MachineObservation"` | All records | +| occurrenceStatus | `"present"` | Always present (we don't record absences) | +| scientificName | `occurrence.determination.name` | | +| taxonRank | `occurrence.determination.rank` | Lowercase | +| kingdom | from `determination.parents_json` | Walk parent chain | +| phylum | from `determination.parents_json` | | +| class | from `determination.parents_json` | | +| order | from `determination.parents_json` | | +| family | from `determination.parents_json` | | +| genus | from `determination.parents_json` | | +| specificEpithet | split from species name | Second word of binomial | +| vernacularName | `determination.common_name_en` | | +| taxonID | `determination.gbif_taxon_key` or internal URN | | +| individualCount | `occurrence.detections_count` | Number of detections | +| associatedMedia | Detection image URLs | Pipe-separated | +| identifiedBy | `"AMI ML Pipeline"` or identification user | | +| dateIdentified | `occurrence.created_at` or identification date | | +| identificationRemarks | Score info, algorithm used | | +| identificationVerificationStatus | Verified/Not verified | Based on identifications | + +## Validation + +- Core IDs must be unique +- Extension coreid values must reference existing core IDs +- No literal "NULL" values +- UTF-8 encoding throughout +- GBIF validator: https://www.gbif.org/tools/data-validator + +## References + +- DwC Text Guide: https://dwc.tdwg.org/text/ +- GBIF DwC-A Guide: https://ipt.gbif.org/manual/en/ipt/latest/dwca-guide +- DwC Terms: https://dwc.tdwg.org/terms/ +- Full terms reference downloaded to: `docs/claude/dwc-terms-reference.md` diff --git a/docs/claude/export-framework.md b/docs/claude/export-framework.md new file mode 100644 index 000000000..7712ba7e1 --- /dev/null +++ b/docs/claude/export-framework.md @@ -0,0 +1,112 @@ +# Export Framework Technical Reference + +## Architecture Overview + +The export system uses a registry pattern where format-specific exporters register themselves and are dispatched by `DataExport.run_export()`. + +### Key Files + +| File | Purpose | +|------|---------| +| `ami/exports/base.py` | `BaseExporter` ABC - all exporters inherit from this | +| `ami/exports/registry.py` | `ExportRegistry` - maps format strings to exporter classes | +| `ami/exports/format_types.py` | Concrete exporters: `JSONExporter`, `CSVExporter` | +| `ami/exports/models.py` | `DataExport` model - tracks export jobs, files, stats | +| `ami/exports/utils.py` | `apply_filters()`, `get_data_in_batches()`, `generate_fake_request()` | +| `ami/exports/views.py` | `DataExportViewSet` - API endpoint for creating/listing exports | +| `ami/exports/serializers.py` | `DataExportSerializer` - validates format, filters | +| `ami/exports/signals.py` | Deletes exported file when `DataExport` is deleted | + +### Flow + +``` +1. User POST /api/v2/exports/ with {format, filters, project} +2. DataExportSerializer validates format against ExportRegistry +3. DataExport created, Job created (job_type_key="data_export") +4. Celery task calls DataExport.run_export() +5. run_export() calls DataExport.get_exporter() → ExportRegistry lookup +6. Exporter.__init__() builds queryset with filters +7. Exporter.export() writes temp file, returns path +8. DataExport.save_export_file() uploads to default_storage (S3/MinIO) +9. file_url saved to DataExport model +``` + +### BaseExporter (ami/exports/base.py) + +```python +class BaseExporter(ABC): + file_format = "" # e.g. "json", "csv", "zip" + serializer_class = None # DRF serializer for data transformation + filter_backends = [] # DRF filter backends + + def __init__(self, data_export): + # Sets self.data_export, self.job, self.project + # Builds self.queryset using get_queryset() + apply_filters() + # Sets self.total_records = queryset.count() + + @abstractmethod + def export(self) -> str: + """Must return path to temp file.""" + + @abstractmethod + def get_queryset(self): + """Must return a Django QuerySet.""" + + def get_filter_backends(self): + return [OccurrenceCollectionFilter] # default + + def update_export_stats(self, file_temp_path): + """Updates record_count and file_size on DataExport.""" + + def update_job_progress(self, records_exported): + """Updates Job progress stage.""" +``` + +### ExportRegistry (ami/exports/registry.py) + +```python +ExportRegistry.register("format_name")(ExporterClass) +ExportRegistry.get_exporter("format_name") # → ExporterClass +ExportRegistry.get_supported_formats() # → ["occurrences_api_json", "occurrences_simple_csv"] +``` + +### DataExport Model (ami/exports/models.py) + +Key fields: +- `user` FK → User (who triggered) +- `project` FK → Project (scoped to project) +- `format` CharField (registry key) +- `filters` JSONField (e.g. `{"collection_id": 5}`) +- `filters_display` JSONField (precomputed human-readable) +- `file_url` URLField (final download URL) +- `record_count` PositiveIntegerField +- `file_size` PositiveBigIntegerField + +Key methods: +- `run_export()` - orchestrates the full export pipeline +- `save_export_file(temp_path)` - uploads to storage, returns URL +- `generate_filename()` - `{project_slug}_export-{pk}.{ext}` +- `get_exporter()` - cached exporter instance + +### Adding a New Export Format + +1. Create exporter class extending `BaseExporter` +2. Set `file_format` (file extension) +3. Implement `get_queryset()` and `export()` +4. Register: `ExportRegistry.register("format_key")(YourExporter)` +5. The format automatically appears in the API's valid choices + +### Utilities (ami/exports/utils.py) + +- `generate_fake_request()` - creates a DRF Request for serializer context (needed because exports run in Celery, not in HTTP request context) +- `apply_filters(queryset, filters, filter_backends)` - applies DRF filter backends using fake request with filter query params +- `get_data_in_batches(queryset, serializer_class, batch_size=100)` - yields batches of serialized data using queryset.iterator() + +### Important Notes + +- Exports run as Celery tasks, so no real HTTP request is available +- The `generate_fake_request()` utility creates a mock DRF request for serializer context (needed for HyperlinkedModelSerializer URLs) +- Filters are passed as query params on the fake request +- Default filter backend is `OccurrenceCollectionFilter` (filters by collection_id) +- The export file is written to a temp file, then uploaded to default_storage (S3/MinIO) +- On DataExport deletion, the signal handler deletes the file from storage From 3c9945b579bc8bdb336edfecc47056effc043cee Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 12:02:07 -0800 Subject: [PATCH 02/15] feat(exports): add Darwin Core Archive (DwC-A) export format Add Event Core + Occurrence Extension DwC-A exporter that produces a GBIF-compatible ZIP containing event.txt, occurrence.txt, meta.xml, and eml.xml. Events are the core entity with occurrences linked via eventID foreign key. Key design decisions: - Direct TSV writing with iterator(chunk_size=500) instead of DRF serializers - Taxonomy hierarchy extracted from parents_json to avoid N+1 queries - meta.xml generated from the same field definitions used for TSV columns - basisOfRecord = "MachineObservation" for all records - URN format IDs: urn:ami:event:{slug}:{id}, urn:ami:occurrence:{slug}:{id} Co-Authored-By: Claude --- ami/exports/dwca.py | 353 ++++++++++++++++++++++++++++++++++++ ami/exports/format_types.py | 76 ++++++++ ami/exports/registry.py | 1 + 3 files changed, 430 insertions(+) create mode 100644 ami/exports/dwca.py diff --git a/ami/exports/dwca.py b/ami/exports/dwca.py new file mode 100644 index 000000000..1936252d6 --- /dev/null +++ b/ami/exports/dwca.py @@ -0,0 +1,353 @@ +""" +Darwin Core Archive (DwC-A) field mappings, metadata generators, and taxonomy helpers. + +Implements Event Core architecture with Occurrence Extension for GBIF-compatible archives. +""" + +import csv +import datetime +import logging +import tempfile +import zipfile +from xml.etree import ElementTree as ET + +from django.utils.text import slugify + +logger = logging.getLogger(__name__) + +# DwC term URI base +DWC = "http://rs.tdwg.org/dwc/terms/" +DC = "http://purl.org/dc/terms/" + +# ────────────────────────────────────────────────────────────── +# Event field definitions: (dwc_term_uri, header_name, getter) +# ────────────────────────────────────────────────────────────── + +EVENT_FIELDS: list[tuple[str, str, object]] = [ + (DWC + "eventID", "eventID", lambda e, slug: f"urn:ami:event:{slug}:{e.id}"), + (DWC + "eventDate", "eventDate", lambda e, slug: _format_event_date(e)), + (DWC + "eventTime", "eventTime", lambda e, slug: _format_time(e.start)), + (DWC + "year", "year", lambda e, slug: str(e.start.year) if e.start else ""), + (DWC + "month", "month", lambda e, slug: str(e.start.month) if e.start else ""), + (DWC + "day", "day", lambda e, slug: str(e.start.day) if e.start else ""), + (DWC + "samplingProtocol", "samplingProtocol", lambda e, slug: "automated light trap with camera"), + (DWC + "sampleSizeValue", "sampleSizeValue", lambda e, slug: str(e.captures_count or 0)), + (DWC + "sampleSizeUnit", "sampleSizeUnit", lambda e, slug: "images"), + (DWC + "samplingEffort", "samplingEffort", lambda e, slug: _format_duration(e)), + (DWC + "locationID", "locationID", lambda e, slug: e.deployment.name if e.deployment else ""), + ( + DWC + "decimalLatitude", + "decimalLatitude", + lambda e, slug: _format_coord(e.deployment.latitude if e.deployment else None), + ), + ( + DWC + "decimalLongitude", + "decimalLongitude", + lambda e, slug: _format_coord(e.deployment.longitude if e.deployment else None), + ), + (DWC + "geodeticDatum", "geodeticDatum", lambda e, slug: "WGS84"), + (DWC + "datasetName", "datasetName", lambda e, slug: e.project.name if e.project else ""), + (DC + "modified", "modified", lambda e, slug: _format_datetime(e.updated_at)), +] + +# ────────────────────────────────────────────────────────────── +# Occurrence field definitions +# ────────────────────────────────────────────────────────────── + +OCCURRENCE_FIELDS: list[tuple[str, str, object]] = [ + (DWC + "eventID", "eventID", lambda o, slug: f"urn:ami:event:{slug}:{o.event_id}" if o.event_id else ""), + (DWC + "occurrenceID", "occurrenceID", lambda o, slug: f"urn:ami:occurrence:{slug}:{o.id}"), + (DWC + "basisOfRecord", "basisOfRecord", lambda o, slug: "MachineObservation"), + (DWC + "occurrenceStatus", "occurrenceStatus", lambda o, slug: "present"), + (DWC + "scientificName", "scientificName", lambda o, slug: o.determination.name if o.determination else ""), + (DWC + "taxonRank", "taxonRank", lambda o, slug: o.determination.rank.lower() if o.determination else ""), + (DWC + "kingdom", "kingdom", lambda o, slug: _get_rank_from_parents(o, "KINGDOM")), + (DWC + "phylum", "phylum", lambda o, slug: _get_rank_from_parents(o, "PHYLUM")), + (DWC + "class", "class", lambda o, slug: _get_rank_from_parents(o, "CLASS")), + (DWC + "order", "order", lambda o, slug: _get_rank_from_parents(o, "ORDER")), + (DWC + "family", "family", lambda o, slug: _get_rank_from_parents(o, "FAMILY")), + (DWC + "genus", "genus", lambda o, slug: _get_rank_from_parents(o, "GENUS")), + ( + DWC + "specificEpithet", + "specificEpithet", + lambda o, slug: get_specific_epithet(o.determination.name if o.determination else ""), + ), + ( + DWC + "vernacularName", + "vernacularName", + lambda o, slug: o.determination.common_name_en or "" if o.determination else "", + ), + ( + DWC + "taxonID", + "taxonID", + lambda o, slug: str(o.determination.gbif_taxon_key) + if o.determination and o.determination.gbif_taxon_key + else "", + ), + (DWC + "individualCount", "individualCount", lambda o, slug: str(getattr(o, "detections_count", 0) or 0)), + ( + DWC + "identificationVerificationStatus", + "identificationVerificationStatus", + lambda o, slug: _get_verification_status(o), + ), + (DC + "modified", "modified", lambda o, slug: _format_datetime(o.updated_at)), +] + + +# ────────────────────────────────────────────────────────────── +# Helper functions +# ────────────────────────────────────────────────────────────── + + +def _format_event_date(event) -> str: + """Format event date as ISO date or date interval.""" + if not event.start: + return "" + start_date = event.start.date().isoformat() + if event.end and event.end.date() != event.start.date(): + return f"{start_date}/{event.end.date().isoformat()}" + return start_date + + +def _format_time(dt) -> str: + if not dt: + return "" + return dt.strftime("%H:%M:%S") + + +def _format_datetime(dt) -> str: + if not dt: + return "" + if isinstance(dt, datetime.datetime): + return dt.isoformat() + return str(dt) + + +def _format_coord(value) -> str: + if value is None: + return "" + return str(round(value, 6)) + + +def _format_duration(event) -> str: + """Format event duration as human-readable string.""" + if not event.start or not event.end: + return "" + delta = event.end - event.start + total_seconds = int(delta.total_seconds()) + hours, remainder = divmod(total_seconds, 3600) + minutes, _ = divmod(remainder, 60) + if hours > 0: + return f"{hours}h {minutes}m" + return f"{minutes}m" + + +def _get_rank_from_parents(occurrence, rank: str) -> str: + """Extract a taxon name at a specific rank from determination.parents_json.""" + if not occurrence.determination: + return "" + parents = occurrence.determination.parents_json + if not parents: + return "" + for parent in parents: + # parents_json contains TaxonParent objects (or dicts with id, name, rank) + parent_rank = parent.rank if hasattr(parent, "rank") else parent.get("rank", "") + # TaxonRank enum values are uppercase strings + parent_rank_str = parent_rank.name if hasattr(parent_rank, "name") else str(parent_rank) + if parent_rank_str.upper() == rank: + return parent.name if hasattr(parent, "name") else parent.get("name", "") + # Also check the determination itself if it matches the requested rank + det_rank = occurrence.determination.rank + if det_rank.upper() == rank: + return occurrence.determination.name + return "" + + +def get_specific_epithet(name: str) -> str: + """Extract the specific epithet (second word) from a binomial name.""" + parts = name.split() + if len(parts) >= 2: + return parts[1] + return "" + + +def _get_verification_status(occurrence) -> str: + """Return verification status based on whether identifications exist.""" + # Use prefetched identifications if available + if hasattr(occurrence, "_prefetched_objects_cache") and "identifications" in occurrence._prefetched_objects_cache: + return "verified" if occurrence.identifications.all() else "unverified" + # Fall back to exists() check + return "verified" if occurrence.identifications.exists() else "unverified" + + +# ────────────────────────────────────────────────────────────── +# TSV writing +# ────────────────────────────────────────────────────────────── + + +def write_tsv( + filepath: str, fields: list[tuple[str, str, object]], queryset, project_slug: str, progress_callback=None +): + """Write a tab-delimited file from a queryset using field definitions. + + Returns the number of records written. + """ + headers = [f[1] for f in fields] + getters = [f[2] for f in fields] + records_written = 0 + + with open(filepath, "w", encoding="utf-8", newline="") as f: + writer = csv.writer(f, delimiter="\t", quoting=csv.QUOTE_MINIMAL, lineterminator="\n") + writer.writerow(headers) + + for obj in queryset.iterator(chunk_size=500): + row = [getter(obj, project_slug) for getter in getters] + writer.writerow(row) + records_written += 1 + if progress_callback and records_written % 500 == 0: + progress_callback(records_written) + + return records_written + + +# ────────────────────────────────────────────────────────────── +# meta.xml generation +# ────────────────────────────────────────────────────────────── + + +def generate_meta_xml( + event_fields, occurrence_fields, event_filename="event.txt", occurrence_filename="occurrence.txt" +) -> str: + """Generate DwC-A meta.xml descriptor mapping columns to DwC term URIs.""" + + archive = ET.Element("archive") + archive.set("xmlns", "http://rs.tdwg.org/dwc/text/") + archive.set("metadata", "eml.xml") + + # Core: Event + core = ET.SubElement(archive, "core") + core.set("rowType", DWC + "Event") + core.set("encoding", "UTF-8") + core.set("fieldsTerminatedBy", "\\t") + core.set("linesTerminatedBy", "\\n") + core.set("fieldsEnclosedBy", "") + core.set("ignoreHeaderLines", "1") + + files = ET.SubElement(core, "files") + location = ET.SubElement(files, "location") + location.text = event_filename + + # Column 0 is the id (eventID) + id_elem = ET.SubElement(core, "id") + id_elem.set("index", "0") + + for i, (term_uri, header, _) in enumerate(event_fields): + if i == 0: + continue # Already declared as + field = ET.SubElement(core, "field") + field.set("index", str(i)) + field.set("term", term_uri) + + # Extension: Occurrence + extension = ET.SubElement(archive, "extension") + extension.set("rowType", DWC + "Occurrence") + extension.set("encoding", "UTF-8") + extension.set("fieldsTerminatedBy", "\\t") + extension.set("linesTerminatedBy", "\\n") + extension.set("fieldsEnclosedBy", "") + extension.set("ignoreHeaderLines", "1") + + files = ET.SubElement(extension, "files") + location = ET.SubElement(files, "location") + location.text = occurrence_filename + + # Column 0 is the coreid (eventID foreign key) + coreid = ET.SubElement(extension, "coreid") + coreid.set("index", "0") + + for i, (term_uri, header, _) in enumerate(occurrence_fields): + if i == 0: + continue # Already declared as + field = ET.SubElement(extension, "field") + field.set("index", str(i)) + field.set("term", term_uri) + + # Format with XML declaration + ET.indent(archive, space=" ") + xml_str = ET.tostring(archive, encoding="unicode", xml_declaration=False) + return '\n' + xml_str + "\n" + + +# ────────────────────────────────────────────────────────────── +# eml.xml generation +# ────────────────────────────────────────────────────────────── + + +def generate_eml_xml(project, events_queryset=None) -> str: + """Generate minimal EML 2.1.1 metadata XML for the dataset.""" + + project_slug = slugify(project.name) + now = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S") + + eml = ET.Element("eml:eml") + eml.set("xmlns:eml", "eml://ecoinformatics.org/eml-2.1.1") + eml.set("xmlns:dc", "http://purl.org/dc/terms/") + eml.set("xmlns:xsi", "http://www.w3.org/2001/XMLSchema-instance") + eml.set("xsi:schemaLocation", "eml://ecoinformatics.org/eml-2.1.1 eml.xsd") + eml.set("packageId", f"urn:ami:dataset:{project_slug}:{now}") + eml.set("system", "AMI") + + dataset = ET.SubElement(eml, "dataset") + + title = ET.SubElement(dataset, "title") + title.text = project.name + + # Creator + creator = ET.SubElement(dataset, "creator") + org = ET.SubElement(creator, "organizationName") + org.text = "Automated Monitoring of Insects (AMI)" + if project.owner: + individual = ET.SubElement(creator, "individualName") + surname = ET.SubElement(individual, "surName") + surname.text = project.owner.email + + # Abstract + abstract = ET.SubElement(dataset, "abstract") + para = ET.SubElement(abstract, "para") + para.text = project.description or f"Biodiversity monitoring data from {project.name}." + + # Contact + contact = ET.SubElement(dataset, "contact") + contact_org = ET.SubElement(contact, "organizationName") + contact_org.text = "Automated Monitoring of Insects (AMI)" + + # Intellectual rights + rights = ET.SubElement(dataset, "intellectualRights") + rights_para = ET.SubElement(rights, "para") + rights_para.text = "This work is licensed under a Creative Commons Attribution 4.0 International License." + + ET.indent(eml, space=" ") + xml_str = ET.tostring(eml, encoding="unicode", xml_declaration=False) + return '\n' + xml_str + "\n" + + +# ────────────────────────────────────────────────────────────── +# Archive packaging +# ────────────────────────────────────────────────────────────── + + +def create_dwca_zip(event_file: str, occurrence_file: str, meta_xml: str, eml_xml: str) -> str: + """Package event.txt, occurrence.txt, meta.xml, and eml.xml into a DwC-A ZIP. + + Returns the path to the temporary ZIP file. + """ + temp_zip = tempfile.NamedTemporaryFile(delete=False, suffix=".zip") + temp_zip.close() + + with zipfile.ZipFile(temp_zip.name, "w", zipfile.ZIP_DEFLATED) as zf: + zf.write(event_file, "event.txt") + zf.write(occurrence_file, "occurrence.txt") + zf.writestr("meta.xml", meta_xml) + zf.writestr("eml.xml", eml_xml) + + return temp_zip.name diff --git a/ami/exports/format_types.py b/ami/exports/format_types.py index bc628a8ef..f5eaf345b 100644 --- a/ami/exports/format_types.py +++ b/ami/exports/format_types.py @@ -186,3 +186,79 @@ def export(self): self.update_job_progress(records_exported) self.update_export_stats(file_temp_path=temp_file.name) return temp_file.name # Return the file path + + +class DwCAExporter(BaseExporter): + """Handles Darwin Core Archive (DwC-A) export with Event Core and Occurrence Extension.""" + + file_format = "zip" + + def get_queryset(self): + """Return the occurrence queryset (used by BaseExporter for record count).""" + return ( + Occurrence.objects.valid() # type: ignore[union-attr] + .filter(project=self.project) + .select_related( + "determination", + "event", + "deployment", + ) + .with_detections_count() + .with_identifications() + ) + + def get_events_queryset(self): + from ami.main.models import Event + + return Event.objects.filter(project=self.project).select_related( + "deployment", + "project", + ) + + def get_filter_backends(self): + # DwC-A exports events + occurrences; the collection-based filter doesn't apply + return [] + + def export(self): + """Export project data as a Darwin Core Archive ZIP.""" + from django.utils.text import slugify + + from ami.exports.dwca import ( + EVENT_FIELDS, + OCCURRENCE_FIELDS, + create_dwca_zip, + generate_eml_xml, + generate_meta_xml, + write_tsv, + ) + + project_slug = slugify(self.project.name) + + # Write event.txt + event_file = tempfile.NamedTemporaryFile(delete=False, suffix=".txt", mode="w", encoding="utf-8") + event_file.close() + events_qs = self.get_events_queryset() + event_count = write_tsv(event_file.name, EVENT_FIELDS, events_qs, project_slug) + logger.info(f"DwC-A: wrote {event_count} events") + + # Write occurrence.txt + occ_file = tempfile.NamedTemporaryFile(delete=False, suffix=".txt", mode="w", encoding="utf-8") + occ_file.close() + occ_count = write_tsv( + occ_file.name, + OCCURRENCE_FIELDS, + self.queryset, + project_slug, + progress_callback=lambda n: self.update_job_progress(n), + ) + logger.info(f"DwC-A: wrote {occ_count} occurrences") + + # Generate metadata + meta_xml = generate_meta_xml(EVENT_FIELDS, OCCURRENCE_FIELDS) + eml_xml = generate_eml_xml(self.project, events_qs) + + # Package into ZIP + zip_path = create_dwca_zip(event_file.name, occ_file.name, meta_xml, eml_xml) + + self.update_export_stats(file_temp_path=zip_path) + return zip_path diff --git a/ami/exports/registry.py b/ami/exports/registry.py index 29a4cc0e7..695ded051 100644 --- a/ami/exports/registry.py +++ b/ami/exports/registry.py @@ -27,3 +27,4 @@ def get_supported_formats(cls): ExportRegistry.register("occurrences_api_json")(format_types.JSONExporter) ExportRegistry.register("occurrences_simple_csv")(format_types.CSVExporter) +ExportRegistry.register("dwca")(format_types.DwCAExporter) From 0cf0b92ff519590a2fd8062a9bbb4b7a733c2741 Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 12:02:19 -0800 Subject: [PATCH 03/15] test(exports): add DwC-A export tests Test ZIP structure, event/occurrence headers and row counts, meta.xml core/extension structure, referential integrity between events and occurrences, taxonomy hierarchy extraction from parents_json, specific epithet parsing, and EML metadata validity. Co-Authored-By: Claude --- ami/exports/tests.py | 228 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 228 insertions(+) diff --git a/ami/exports/tests.py b/ami/exports/tests.py index 6dcd73915..78d1e7207 100644 --- a/ami/exports/tests.py +++ b/ami/exports/tests.py @@ -1,6 +1,9 @@ import csv import json import logging +import zipfile +from io import StringIO +from xml.etree import ElementTree as ET from django.core.files.base import ContentFile from django.core.files.storage import default_storage @@ -302,3 +305,228 @@ def test_non_member_cannot_create_export(self): self.non_member.has_perm(Project.Permissions.CREATE_DATA_EXPORT, self.project), "Non-member should not have create_dataexport permission", ) + + +class DwCAExportTest(TestCase): + """Tests for Darwin Core Archive (DwC-A) export format.""" + + def setUp(self): + self.project, self.deployment = setup_test_project(reuse=False) + self.user = self.project.owner + create_captures(deployment=self.deployment, num_nights=2, images_per_night=4, interval_minutes=1) + group_images_into_events(self.deployment) + create_taxa(self.project) + create_occurrences(num=10, deployment=self.deployment) + + # Verify test data was created + self.assertGreater(self.project.events.count(), 0, "No events created for testing.") + self.assertGreater( + Occurrence.objects.valid().filter(project=self.project).count(), # type: ignore[union-attr] + 0, + "No valid occurrences created for testing.", + ) + + def _run_export(self): + """Run a DwC-A export and return the file path.""" + data_export = DataExport.objects.create( + user=self.user, + project=self.project, + format="dwca", + job=None, + ) + file_url = data_export.run_export() + self.assertIsNotNone(file_url) + file_path = file_url.replace("/media/", "") + self.assertTrue(default_storage.exists(file_path)) + return file_path + + def test_dwca_exporter_is_registered(self): + """DwC-A exporter should be registered and retrievable.""" + from ami.exports.registry import ExportRegistry + + exporter_cls = ExportRegistry.get_exporter("dwca") + self.assertIsNotNone(exporter_cls, "DwC-A exporter not found in registry") + self.assertEqual(exporter_cls.file_format, "zip") + + def test_export_produces_valid_zip(self): + """Export should produce a valid ZIP with expected files.""" + file_path = self._run_export() + try: + with default_storage.open(file_path, "rb") as f: + self.assertTrue(zipfile.is_zipfile(f)) + f.seek(0) + with zipfile.ZipFile(f, "r") as zf: + names = zf.namelist() + self.assertIn("event.txt", names) + self.assertIn("occurrence.txt", names) + self.assertIn("meta.xml", names) + self.assertIn("eml.xml", names) + finally: + default_storage.delete(file_path) + + def test_event_headers_and_row_count(self): + """event.txt should have correct headers and row count matching events.""" + file_path = self._run_export() + try: + with default_storage.open(file_path, "rb") as f: + with zipfile.ZipFile(f, "r") as zf: + event_data = zf.read("event.txt").decode("utf-8") + reader = csv.DictReader(StringIO(event_data), delimiter="\t") + rows = list(reader) + + # Check headers + self.assertIn("eventID", reader.fieldnames) + self.assertIn("eventDate", reader.fieldnames) + self.assertIn("decimalLatitude", reader.fieldnames) + self.assertIn("samplingProtocol", reader.fieldnames) + + # Row count should match project events + expected_count = self.project.events.count() + self.assertEqual(len(rows), expected_count, "Event row count mismatch") + finally: + default_storage.delete(file_path) + + def test_occurrence_headers_and_row_count(self): + """occurrence.txt should have correct headers and row count matching valid occurrences.""" + file_path = self._run_export() + try: + with default_storage.open(file_path, "rb") as f: + with zipfile.ZipFile(f, "r") as zf: + occ_data = zf.read("occurrence.txt").decode("utf-8") + reader = csv.DictReader(StringIO(occ_data), delimiter="\t") + rows = list(reader) + + # Check headers + self.assertIn("occurrenceID", reader.fieldnames) + self.assertIn("scientificName", reader.fieldnames) + self.assertIn("basisOfRecord", reader.fieldnames) + self.assertIn("taxonRank", reader.fieldnames) + + # Row count should match valid occurrences + expected_count = ( + Occurrence.objects.valid().filter(project=self.project).count() # type: ignore[union-attr] + ) + self.assertEqual(len(rows), expected_count, "Occurrence row count mismatch") + + # All rows should have basisOfRecord = MachineObservation + for row in rows: + self.assertEqual(row["basisOfRecord"], "MachineObservation") + finally: + default_storage.delete(file_path) + + def test_meta_xml_structure(self): + """meta.xml should be valid XML with correct core/extension structure.""" + file_path = self._run_export() + try: + with default_storage.open(file_path, "rb") as f: + with zipfile.ZipFile(f, "r") as zf: + meta_xml = zf.read("meta.xml").decode("utf-8") + root = ET.fromstring(meta_xml) + + # Default namespace + ns = "http://rs.tdwg.org/dwc/text/" + + # Should have a core element with Event rowType + core = root.find(f"{{{ns}}}core") + self.assertIsNotNone(core, "meta.xml missing element") + self.assertIn("Event", core.get("rowType", "")) + + # Should have an extension element with Occurrence rowType + ext = root.find(f"{{{ns}}}extension") + self.assertIsNotNone(ext, "meta.xml missing element") + self.assertIn("Occurrence", ext.get("rowType", "")) + + # Core should reference event.txt + core_location = core.find(f".//{{{ns}}}location") + self.assertIsNotNone(core_location, "meta.xml core missing ") + self.assertEqual(core_location.text, "event.txt") + + # Extension should reference occurrence.txt + ext_location = ext.find(f".//{{{ns}}}location") + self.assertIsNotNone(ext_location, "meta.xml extension missing ") + self.assertEqual(ext_location.text, "occurrence.txt") + finally: + default_storage.delete(file_path) + + def test_referential_integrity(self): + """All occurrence eventIDs should reference existing event eventIDs.""" + file_path = self._run_export() + try: + with default_storage.open(file_path, "rb") as f: + with zipfile.ZipFile(f, "r") as zf: + # Read event IDs + event_data = zf.read("event.txt").decode("utf-8") + event_reader = csv.DictReader(StringIO(event_data), delimiter="\t") + event_ids = {row["eventID"] for row in event_reader} + + # Read occurrence eventIDs + occ_data = zf.read("occurrence.txt").decode("utf-8") + occ_reader = csv.DictReader(StringIO(occ_data), delimiter="\t") + occ_event_ids = {row["eventID"] for row in occ_reader if row["eventID"]} + + # All occurrence eventIDs should exist in events + orphaned = occ_event_ids - event_ids + self.assertEqual( + len(orphaned), + 0, + f"Orphaned occurrence eventIDs (not in events): {orphaned}", + ) + finally: + default_storage.delete(file_path) + + def test_taxonomy_hierarchy_extraction(self): + """Taxonomy fields should be extracted from parents_json.""" + from ami.exports.dwca import _get_rank_from_parents + + # Get an occurrence with a determination that has parents + occurrence = ( + Occurrence.objects.valid() # type: ignore[union-attr] + .filter(project=self.project, determination__isnull=False) + .select_related("determination") + .first() + ) + self.assertIsNotNone(occurrence, "No occurrence with determination found") + + # Update parents_json on the taxon so we can test extraction + taxon = occurrence.determination + taxon.save(update_calculated_fields=True) + taxon.refresh_from_db() + + # If the taxon has parents, at least one rank should resolve + if taxon.parents_json: + ranks_found = [] + for rank in ["KINGDOM", "PHYLUM", "CLASS", "ORDER", "FAMILY", "GENUS"]: + value = _get_rank_from_parents(occurrence, rank) + if value: + ranks_found.append(rank) + self.assertGreater(len(ranks_found), 0, "No taxonomy ranks extracted from parents_json") + + def test_specific_epithet_extraction(self): + """get_specific_epithet should extract the second word of a binomial name.""" + from ami.exports.dwca import get_specific_epithet + + self.assertEqual(get_specific_epithet("Vanessa cardui"), "cardui") + self.assertEqual(get_specific_epithet("Vanessa"), "") + self.assertEqual(get_specific_epithet(""), "") + self.assertEqual(get_specific_epithet("Homo sapiens sapiens"), "sapiens") + + def test_eml_xml_valid(self): + """eml.xml should be valid XML with project metadata.""" + file_path = self._run_export() + try: + with default_storage.open(file_path, "rb") as f: + with zipfile.ZipFile(f, "r") as zf: + eml_xml = zf.read("eml.xml").decode("utf-8") + root = ET.fromstring(eml_xml) + + # Should have a dataset element + ns = {"eml": "eml://ecoinformatics.org/eml-2.1.1"} + dataset = root.find("eml:dataset", ns) or root.find("dataset") + self.assertIsNotNone(dataset, "eml.xml missing element") + + # Title should match project name + title = dataset.find("eml:title", ns) or dataset.find("title") + self.assertIsNotNone(title) + self.assertEqual(title.text, self.project.name) + finally: + default_storage.delete(file_path) From 928d9fcb603b209a11e4cb3515f53e4ebb2abdef Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 12:05:17 -0800 Subject: [PATCH 04/15] docs: add feature context and roadmap to DwC-A export plan Co-Authored-By: Claude --- .agents/planning/dwca-export-plan.md | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/.agents/planning/dwca-export-plan.md b/.agents/planning/dwca-export-plan.md index 3fbaed15b..818b5f29a 100644 --- a/.agents/planning/dwca-export-plan.md +++ b/.agents/planning/dwca-export-plan.md @@ -1,14 +1,25 @@ # Plan: Add DwC-A (Darwin Core Archive) Export Format +## Why + +AMI projects produce biodiversity occurrence data (species observations from automated insect monitoring stations). To make this data discoverable and citable in the global biodiversity research community, it needs to be published to GBIF (Global Biodiversity Information Facility). GBIF's standard ingestion format is the Darwin Core Archive (DwC-A). + +**Roadmap:** +1. **This PR** — Static DwC-A export: user triggers an export, downloads a ZIP file. Validates against GBIF's data validator. Serves as the foundation for all downstream GBIF integration. +2. **Near follow-up** — Enrich the archive with additional DwC extensions (multimedia, measurement/fact) and a more complete EML metadata profile. Apply project default filters to the export. +3. **Eventual** — Automated publishing: either push archives to a hosted GBIF IPT (Integrated Publishing Toolkit) server, or implement the IPT's RSS/DwC-A endpoint protocol directly within Antenna so it can act as its own IPT, serving a feed that GBIF crawls on a schedule. + ## Context -The project needs to export biodiversity data as Darwin Core Archives for sharing with GBIF and other aggregators. The export framework already exists (`ami/exports/`) with JSON and CSV formats registered. We need to add a new DwC-A exporter that produces a ZIP containing event.txt (core), occurrence.txt (extension), meta.xml, and eml.xml. +The export framework already exists (`ami/exports/`) with JSON and CSV formats registered via a simple registry pattern. Adding a new format requires: an exporter class, field mappings, and a one-line registration. The `DataExport` model and async job infrastructure handle storage, progress tracking, and file serving. **Decisions made:** -- Event-core architecture (events as core, occurrences as extension) -- URN format for IDs: `urn:ami:event:{project_slug}:{id}`, `urn:ami:occurrence:{project_slug}:{id}` -- Coordinates from Deployment lat/lon only (text locality fields deferred) -- `basisOfRecord` = `"MachineObservation"` for all records +- **Event-core architecture** (events as core, occurrences as extension) — This matches AMI's data model (monitoring sessions containing species observations) and is the recommended GBIF pattern for sampling-event datasets, which enables richer ecological analysis than occurrence-only archives. +- **URN format for IDs**: `urn:ami:event:{project_slug}:{id}`, `urn:ami:occurrence:{project_slug}:{id}` — Globally unique, stable, and human-readable. The project slug provides namespacing across AMI instances. +- **Coordinates from Deployment lat/lon only** (text locality fields like country/stateProvince deferred) — Deployments store coordinates; reverse geocoding for text fields is a separate concern. +- **`basisOfRecord` = `"MachineObservation"`** — GBIF's standard term for automated/sensor-derived observations, distinct from `HumanObservation`. +- **No DRF serializer** — DwC fields are flat extractions, not nested API representations. Direct TSV writing is simpler and faster. +- **Taxonomy from `parents_json`** — Avoids N+1 parent chain queries by walking the pre-computed `parents_json` list on each Taxon. ## Implementation Steps From dd2309ee364b9873db73b6d614ae09586ec3a0d0 Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 12:11:48 -0800 Subject: [PATCH 05/15] docs: add review findings and follow-up roadmap to DwC-A plan Co-Authored-By: Claude --- .agents/planning/dwca-export-plan.md | 35 ++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/.agents/planning/dwca-export-plan.md b/.agents/planning/dwca-export-plan.md index 818b5f29a..8b198ffac 100644 --- a/.agents/planning/dwca-export-plan.md +++ b/.agents/planning/dwca-export-plan.md @@ -170,3 +170,38 @@ No changes needed to `DataExport` model. 2. Run new DwC-A tests 3. Manual test: create a DwC-A export via the API or admin, download the ZIP, inspect contents 4. Validate with GBIF Data Validator: https://www.gbif.org/tools/data-validator + +## Known issues to fix before merge + +1. **Occurrences without events produce empty `coreid`** — GBIF rejects orphaned extension rows. Need `.filter(event__isnull=False)` on occurrence queryset. (`ami/exports/format_types.py:199`) +2. **Occurrences without determinations produce empty `scientificName`** — GBIF treats this as required. Need `.filter(determination__isnull=False)`. (`ami/exports/format_types.py:199`) +3. **`individualCount` semantics wrong** — `detections_count` = bounding boxes across frames, not individuals. Each AMI occurrence is one individual. Should emit `1` or omit. (`ami/exports/dwca.py:87`) +4. **`vernacularName` operator precedence** — `x or "" if y else ""` should be `(x or "") if y else ""`. (`ami/exports/dwca.py:78-79`) +5. **Temp files never cleaned up** — event.txt, occurrence.txt, zip temp file leak on worker. (`ami/exports/format_types.py:238-264`) + +## Near follow-up (before real GBIF submission) + +- **Apply project default filters** to occurrence queryset — without this, low-confidence ML determinations get published to GBIF. Biggest data quality risk. +- **Add `license` field** on events — GBIF requires a dataset license for reuse terms. +- **Add `identifiedBy` / `dateIdentified`** — provenance for who/what made the determination. +- **Add `associatedMedia`** — detection image URLs (pipe-separated). Primary evidence for an image-based platform. +- **Runtime validation before packaging** — check for missing required fields, orphaned references, before creating the ZIP. +- **Upgrade EML to 2.2.0** — current code uses 2.1.1, GBIF recommends 2.2.0. The reference doc already shows 2.2.0. + +## Eventual follow-up + +- EML geographic/temporal coverage computed from actual data (bounding box, date range) +- `country`, `stateProvince`, `locality` on events (requires reverse geocoding or Site model fields) +- `coordinateUncertaintyInMeters` +- `institutionCode`, `collectionCode` (project-level settings) +- `scientificNameAuthorship` from `Taxon.author` +- `eventType` field +- Multimedia extension file (`multimedia.txt`) +- GBIF Data Validator automated integration test +- IPT server integration / acting as IPT endpoint for GBIF crawling + +## Nice to haves + +- Use `default` attribute in meta.xml for constant fields (`basisOfRecord`, `geodeticDatum`, etc.) to reduce file size +- Filter events to only those that have occurrences in the export +- Guard against `ZeroDivisionError` in progress callback when `total_records` is 0 From 576cba6ee7af44d659455d7dfd978f918d048829 Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 13:33:40 -0800 Subject: [PATCH 06/15] fix(exports): fix null guards and field semantics in DwC-A mappings - Guard taxonRank against None rank (AttributeError on .lower()) - Guard _get_rank_from_parents against None det_rank - Fix vernacularName ternary precedence with explicit parentheses - Change individualCount to emit "1" (each occurrence = 1 individual, not detections_count which counts bounding boxes) - Guard _format_duration against negative durations Co-Authored-By: Claude --- ami/exports/dwca.py | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/ami/exports/dwca.py b/ami/exports/dwca.py index 1936252d6..baa9bb5c2 100644 --- a/ami/exports/dwca.py +++ b/ami/exports/dwca.py @@ -60,7 +60,11 @@ (DWC + "basisOfRecord", "basisOfRecord", lambda o, slug: "MachineObservation"), (DWC + "occurrenceStatus", "occurrenceStatus", lambda o, slug: "present"), (DWC + "scientificName", "scientificName", lambda o, slug: o.determination.name if o.determination else ""), - (DWC + "taxonRank", "taxonRank", lambda o, slug: o.determination.rank.lower() if o.determination else ""), + ( + DWC + "taxonRank", + "taxonRank", + lambda o, slug: (o.determination.rank.lower() if o.determination and o.determination.rank else ""), + ), (DWC + "kingdom", "kingdom", lambda o, slug: _get_rank_from_parents(o, "KINGDOM")), (DWC + "phylum", "phylum", lambda o, slug: _get_rank_from_parents(o, "PHYLUM")), (DWC + "class", "class", lambda o, slug: _get_rank_from_parents(o, "CLASS")), @@ -75,7 +79,7 @@ ( DWC + "vernacularName", "vernacularName", - lambda o, slug: o.determination.common_name_en or "" if o.determination else "", + lambda o, slug: (o.determination.common_name_en or "") if o.determination else "", ), ( DWC + "taxonID", @@ -84,7 +88,7 @@ if o.determination and o.determination.gbif_taxon_key else "", ), - (DWC + "individualCount", "individualCount", lambda o, slug: str(getattr(o, "detections_count", 0) or 0)), + (DWC + "individualCount", "individualCount", lambda o, slug: "1"), ( DWC + "identificationVerificationStatus", "identificationVerificationStatus", @@ -135,6 +139,8 @@ def _format_duration(event) -> str: return "" delta = event.end - event.start total_seconds = int(delta.total_seconds()) + if total_seconds < 0: + return "" hours, remainder = divmod(total_seconds, 3600) minutes, _ = divmod(remainder, 60) if hours > 0: @@ -158,7 +164,7 @@ def _get_rank_from_parents(occurrence, rank: str) -> str: return parent.name if hasattr(parent, "name") else parent.get("name", "") # Also check the determination itself if it matches the requested rank det_rank = occurrence.determination.rank - if det_rank.upper() == rank: + if det_rank and det_rank.upper() == rank: return occurrence.determination.name return "" From a74aee9831debc32774110d8fa495850483f86fd Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 13:34:03 -0800 Subject: [PATCH 07/15] fix(exports): filter null event/determination and fix PII leak in EML - Filter out occurrences with null event or determination from DwC-A export queryset (GBIF rejects empty coreid/scientificName) - Replace project.owner.email with project.owner.name in EML creator element to avoid leaking PII in downloadable archives - Only emit individualName when owner has a name set Co-Authored-By: Claude --- ami/exports/dwca.py | 4 ++-- ami/exports/format_types.py | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/ami/exports/dwca.py b/ami/exports/dwca.py index baa9bb5c2..1cacf555d 100644 --- a/ami/exports/dwca.py +++ b/ami/exports/dwca.py @@ -312,10 +312,10 @@ def generate_eml_xml(project, events_queryset=None) -> str: creator = ET.SubElement(dataset, "creator") org = ET.SubElement(creator, "organizationName") org.text = "Automated Monitoring of Insects (AMI)" - if project.owner: + if project.owner and project.owner.name: individual = ET.SubElement(creator, "individualName") surname = ET.SubElement(individual, "surName") - surname.text = project.owner.email + surname.text = project.owner.name # Abstract abstract = ET.SubElement(dataset, "abstract") diff --git a/ami/exports/format_types.py b/ami/exports/format_types.py index f5eaf345b..13a9d5589 100644 --- a/ami/exports/format_types.py +++ b/ami/exports/format_types.py @@ -197,7 +197,7 @@ def get_queryset(self): """Return the occurrence queryset (used by BaseExporter for record count).""" return ( Occurrence.objects.valid() # type: ignore[union-attr] - .filter(project=self.project) + .filter(project=self.project, event__isnull=False, determination__isnull=False) .select_related( "determination", "event", From ad1b910988bfc478c2a537117e01083a22ec6971 Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 15:26:06 -0800 Subject: [PATCH 08/15] fix(exports): temp file cleanup, timezone, and EML schema fixes - Wrap DwC-A export in try/finally to clean up intermediate temp files - Use timezone.now() instead of naive datetime.datetime.now() - Use full EML schemaLocation URL for GBIF validation compatibility - Remove unused events_queryset parameter from generate_eml_xml - Simplify progress_callback lambda to direct method reference Co-Authored-By: Claude --- ami/exports/dwca.py | 10 +++++--- ami/exports/format_types.py | 51 ++++++++++++++++++++++--------------- 2 files changed, 37 insertions(+), 24 deletions(-) diff --git a/ami/exports/dwca.py b/ami/exports/dwca.py index 1cacf555d..38f19b57e 100644 --- a/ami/exports/dwca.py +++ b/ami/exports/dwca.py @@ -289,17 +289,21 @@ def generate_meta_xml( # ────────────────────────────────────────────────────────────── -def generate_eml_xml(project, events_queryset=None) -> str: +def generate_eml_xml(project) -> str: """Generate minimal EML 2.1.1 metadata XML for the dataset.""" + from django.utils import timezone project_slug = slugify(project.name) - now = datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S") + now = timezone.now().strftime("%Y-%m-%dT%H:%M:%S") eml = ET.Element("eml:eml") eml.set("xmlns:eml", "eml://ecoinformatics.org/eml-2.1.1") eml.set("xmlns:dc", "http://purl.org/dc/terms/") eml.set("xmlns:xsi", "http://www.w3.org/2001/XMLSchema-instance") - eml.set("xsi:schemaLocation", "eml://ecoinformatics.org/eml-2.1.1 eml.xsd") + eml.set( + "xsi:schemaLocation", + "eml://ecoinformatics.org/eml-2.1.1 https://eml.ecoinformatics.org/eml-2.1.1/eml.xsd", + ) eml.set("packageId", f"urn:ami:dataset:{project_slug}:{now}") eml.set("system", "AMI") diff --git a/ami/exports/format_types.py b/ami/exports/format_types.py index 13a9d5589..707c14c71 100644 --- a/ami/exports/format_types.py +++ b/ami/exports/format_types.py @@ -1,6 +1,7 @@ import csv import json import logging +import os import tempfile from django.core.serializers.json import DjangoJSONEncoder @@ -237,28 +238,36 @@ def export(self): # Write event.txt event_file = tempfile.NamedTemporaryFile(delete=False, suffix=".txt", mode="w", encoding="utf-8") event_file.close() - events_qs = self.get_events_queryset() - event_count = write_tsv(event_file.name, EVENT_FIELDS, events_qs, project_slug) - logger.info(f"DwC-A: wrote {event_count} events") - # Write occurrence.txt occ_file = tempfile.NamedTemporaryFile(delete=False, suffix=".txt", mode="w", encoding="utf-8") occ_file.close() - occ_count = write_tsv( - occ_file.name, - OCCURRENCE_FIELDS, - self.queryset, - project_slug, - progress_callback=lambda n: self.update_job_progress(n), - ) - logger.info(f"DwC-A: wrote {occ_count} occurrences") - # Generate metadata - meta_xml = generate_meta_xml(EVENT_FIELDS, OCCURRENCE_FIELDS) - eml_xml = generate_eml_xml(self.project, events_qs) - - # Package into ZIP - zip_path = create_dwca_zip(event_file.name, occ_file.name, meta_xml, eml_xml) - - self.update_export_stats(file_temp_path=zip_path) - return zip_path + try: + events_qs = self.get_events_queryset() + event_count = write_tsv(event_file.name, EVENT_FIELDS, events_qs, project_slug) + logger.info(f"DwC-A: wrote {event_count} events") + + occ_count = write_tsv( + occ_file.name, + OCCURRENCE_FIELDS, + self.queryset, + project_slug, + progress_callback=self.update_job_progress, + ) + logger.info(f"DwC-A: wrote {occ_count} occurrences") + + # Generate metadata + meta_xml = generate_meta_xml(EVENT_FIELDS, OCCURRENCE_FIELDS) + eml_xml = generate_eml_xml(self.project) + + # Package into ZIP + zip_path = create_dwca_zip(event_file.name, occ_file.name, meta_xml, eml_xml) + + self.update_export_stats(file_temp_path=zip_path) + return zip_path + finally: + for path in [event_file.name, occ_file.name]: + try: + os.unlink(path) + except OSError: + pass From 556e1044a963cc7cb0901a1bb57c320fb045c909 Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 15:27:07 -0800 Subject: [PATCH 09/15] fix(exports): update tests and docs for DwC-A review fixes - Update occurrence row count test to match null-filtered queryset - Assert parents_json is populated in taxonomy hierarchy test - Use settings.MEDIA_URL instead of hardcoded "/media/" in test helper - Add DwCAExporter to export-framework.md file table and registry example - Fix EML version in dwca-format-reference.md to match implementation (2.1.1) Co-Authored-By: Claude --- ami/exports/tests.py | 27 ++++++++++++++++----------- docs/claude/dwca-format-reference.md | 5 +++-- docs/claude/export-framework.md | 4 ++-- 3 files changed, 21 insertions(+), 15 deletions(-) diff --git a/ami/exports/tests.py b/ami/exports/tests.py index 78d1e7207..fc6510244 100644 --- a/ami/exports/tests.py +++ b/ami/exports/tests.py @@ -328,6 +328,8 @@ def setUp(self): def _run_export(self): """Run a DwC-A export and return the file path.""" + from django.conf import settings + data_export = DataExport.objects.create( user=self.user, project=self.project, @@ -336,7 +338,7 @@ def _run_export(self): ) file_url = data_export.run_export() self.assertIsNotNone(file_url) - file_path = file_url.replace("/media/", "") + file_path = file_url.replace(settings.MEDIA_URL, "") self.assertTrue(default_storage.exists(file_path)) return file_path @@ -402,9 +404,11 @@ def test_occurrence_headers_and_row_count(self): self.assertIn("basisOfRecord", reader.fieldnames) self.assertIn("taxonRank", reader.fieldnames) - # Row count should match valid occurrences + # Row count should match valid occurrences with event and determination expected_count = ( - Occurrence.objects.valid().filter(project=self.project).count() # type: ignore[union-attr] + Occurrence.objects.valid() # type: ignore[union-attr] + .filter(project=self.project, event__isnull=False, determination__isnull=False) + .count() ) self.assertEqual(len(rows), expected_count, "Occurrence row count mismatch") @@ -492,14 +496,15 @@ def test_taxonomy_hierarchy_extraction(self): taxon.save(update_calculated_fields=True) taxon.refresh_from_db() - # If the taxon has parents, at least one rank should resolve - if taxon.parents_json: - ranks_found = [] - for rank in ["KINGDOM", "PHYLUM", "CLASS", "ORDER", "FAMILY", "GENUS"]: - value = _get_rank_from_parents(occurrence, rank) - if value: - ranks_found.append(rank) - self.assertGreater(len(ranks_found), 0, "No taxonomy ranks extracted from parents_json") + # Ensure parents_json is populated so this test doesn't pass vacuously + self.assertTrue(taxon.parents_json, "Test taxon should have parents_json populated") + + ranks_found = [] + for rank in ["KINGDOM", "PHYLUM", "CLASS", "ORDER", "FAMILY", "GENUS"]: + value = _get_rank_from_parents(occurrence, rank) + if value: + ranks_found.append(rank) + self.assertGreater(len(ranks_found), 0, "No taxonomy ranks extracted from parents_json") def test_specific_epithet_extraction(self): """get_specific_epithet should extract the second word of a binomial name.""" diff --git a/docs/claude/dwca-format-reference.md b/docs/claude/dwca-format-reference.md index dec394bce..0ef9ce2aa 100644 --- a/docs/claude/dwca-format-reference.md +++ b/docs/claude/dwca-format-reference.md @@ -93,9 +93,10 @@ Describes the dataset: title, abstract, creators, geographic/temporal coverage, ```xml - + {project.name} diff --git a/docs/claude/export-framework.md b/docs/claude/export-framework.md index 7712ba7e1..716a7abce 100644 --- a/docs/claude/export-framework.md +++ b/docs/claude/export-framework.md @@ -10,7 +10,7 @@ The export system uses a registry pattern where format-specific exporters regist |------|---------| | `ami/exports/base.py` | `BaseExporter` ABC - all exporters inherit from this | | `ami/exports/registry.py` | `ExportRegistry` - maps format strings to exporter classes | -| `ami/exports/format_types.py` | Concrete exporters: `JSONExporter`, `CSVExporter` | +| `ami/exports/format_types.py` | Concrete exporters: `JSONExporter`, `CSVExporter`, `DwCAExporter` | | `ami/exports/models.py` | `DataExport` model - tracks export jobs, files, stats | | `ami/exports/utils.py` | `apply_filters()`, `get_data_in_batches()`, `generate_fake_request()` | | `ami/exports/views.py` | `DataExportViewSet` - API endpoint for creating/listing exports | @@ -67,7 +67,7 @@ class BaseExporter(ABC): ```python ExportRegistry.register("format_name")(ExporterClass) ExportRegistry.get_exporter("format_name") # → ExporterClass -ExportRegistry.get_supported_formats() # → ["occurrences_api_json", "occurrences_simple_csv"] +ExportRegistry.get_supported_formats() # → ["occurrences_api_json", "occurrences_simple_csv", "dwca"] ``` ### DataExport Model (ami/exports/models.py) From 0927ddbd47a52147acf1a1d0d96f5466916fcd82 Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 15:33:13 -0800 Subject: [PATCH 10/15] fix(exports): meta.xml field mappings, enclosure char, and progress update - Map all columns (including index 0) to DwC term URIs in meta.xml so GBIF validators can resolve both / and entries - Change fieldsEnclosedBy from empty string to double-quote character for stricter parser compatibility - Add final progress update after TSV writing so small exports (<500 records) report completion instead of staying at 0% Co-Authored-By: Claude --- ami/exports/dwca.py | 12 ++++-------- ami/exports/format_types.py | 4 ++++ 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/ami/exports/dwca.py b/ami/exports/dwca.py index 38f19b57e..68259f41a 100644 --- a/ami/exports/dwca.py +++ b/ami/exports/dwca.py @@ -236,7 +236,7 @@ def generate_meta_xml( core.set("encoding", "UTF-8") core.set("fieldsTerminatedBy", "\\t") core.set("linesTerminatedBy", "\\n") - core.set("fieldsEnclosedBy", "") + core.set("fieldsEnclosedBy", '"') core.set("ignoreHeaderLines", "1") files = ET.SubElement(core, "files") @@ -247,9 +247,7 @@ def generate_meta_xml( id_elem = ET.SubElement(core, "id") id_elem.set("index", "0") - for i, (term_uri, header, _) in enumerate(event_fields): - if i == 0: - continue # Already declared as + for i, (term_uri, _header, _) in enumerate(event_fields): field = ET.SubElement(core, "field") field.set("index", str(i)) field.set("term", term_uri) @@ -260,7 +258,7 @@ def generate_meta_xml( extension.set("encoding", "UTF-8") extension.set("fieldsTerminatedBy", "\\t") extension.set("linesTerminatedBy", "\\n") - extension.set("fieldsEnclosedBy", "") + extension.set("fieldsEnclosedBy", '"') extension.set("ignoreHeaderLines", "1") files = ET.SubElement(extension, "files") @@ -271,9 +269,7 @@ def generate_meta_xml( coreid = ET.SubElement(extension, "coreid") coreid.set("index", "0") - for i, (term_uri, header, _) in enumerate(occurrence_fields): - if i == 0: - continue # Already declared as + for i, (term_uri, _header, _) in enumerate(occurrence_fields): field = ET.SubElement(extension, "field") field.set("index", str(i)) field.set("term", term_uri) diff --git a/ami/exports/format_types.py b/ami/exports/format_types.py index 707c14c71..24613c194 100644 --- a/ami/exports/format_types.py +++ b/ami/exports/format_types.py @@ -256,6 +256,10 @@ def export(self): ) logger.info(f"DwC-A: wrote {occ_count} occurrences") + # Ensure final progress update for small exports (<500 records) + if self.total_records: + self.update_job_progress(occ_count) + # Generate metadata meta_xml = generate_meta_xml(EVENT_FIELDS, OCCURRENCE_FIELDS) eml_xml = generate_eml_xml(self.project) From 2f5d381e2bb7fd7abf4f148bf7f28936c2248b5b Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 15:34:21 -0800 Subject: [PATCH 11/15] test(exports): optimize DwC-A tests with setUpClass shared export Run the export pipeline once in setUpClass and share the ZIP across all structural validation tests instead of re-running per test method. This reduces test time from ~7 export runs to 1. Co-Authored-By: Claude --- ami/exports/tests.py | 298 +++++++++++++++++++++---------------------- 1 file changed, 143 insertions(+), 155 deletions(-) diff --git a/ami/exports/tests.py b/ami/exports/tests.py index fc6510244..c18b2cd95 100644 --- a/ami/exports/tests.py +++ b/ami/exports/tests.py @@ -308,40 +308,52 @@ def test_non_member_cannot_create_export(self): class DwCAExportTest(TestCase): - """Tests for Darwin Core Archive (DwC-A) export format.""" - - def setUp(self): - self.project, self.deployment = setup_test_project(reuse=False) - self.user = self.project.owner - create_captures(deployment=self.deployment, num_nights=2, images_per_night=4, interval_minutes=1) - group_images_into_events(self.deployment) - create_taxa(self.project) - create_occurrences(num=10, deployment=self.deployment) - - # Verify test data was created - self.assertGreater(self.project.events.count(), 0, "No events created for testing.") - self.assertGreater( - Occurrence.objects.valid().filter(project=self.project).count(), # type: ignore[union-attr] - 0, - "No valid occurrences created for testing.", - ) - - def _run_export(self): - """Run a DwC-A export and return the file path.""" + """Tests for Darwin Core Archive (DwC-A) export format. + + Uses setUpClass to run the export once and share the ZIP across + structural validation tests for better performance. + """ + + @classmethod + def setUpClass(cls): + super().setUpClass() + cls.project, cls.deployment = setup_test_project(reuse=False) + cls.user = cls.project.owner + create_captures(deployment=cls.deployment, num_nights=2, images_per_night=4, interval_minutes=1) + group_images_into_events(cls.deployment) + create_taxa(cls.project) + create_occurrences(num=10, deployment=cls.deployment) + + # Run the export once and cache the file path + cls._export_file_path = cls._create_export(cls.project, cls.user) + + @classmethod + def tearDownClass(cls): + if cls._export_file_path and default_storage.exists(cls._export_file_path): + default_storage.delete(cls._export_file_path) + super().tearDownClass() + + @staticmethod + def _create_export(project, user): + """Run a DwC-A export and return the storage file path.""" from django.conf import settings data_export = DataExport.objects.create( - user=self.user, - project=self.project, + user=user, + project=project, format="dwca", job=None, ) file_url = data_export.run_export() - self.assertIsNotNone(file_url) + assert file_url is not None, "Export did not produce a file URL" file_path = file_url.replace(settings.MEDIA_URL, "") - self.assertTrue(default_storage.exists(file_path)) + assert default_storage.exists(file_path), f"Export file not found: {file_path}" return file_path + def _open_zip(self): + """Open the cached export ZIP for reading.""" + return default_storage.open(self._export_file_path, "rb") + def test_dwca_exporter_is_registered(self): """DwC-A exporter should be registered and retrievable.""" from ami.exports.registry import ExportRegistry @@ -352,131 +364,111 @@ def test_dwca_exporter_is_registered(self): def test_export_produces_valid_zip(self): """Export should produce a valid ZIP with expected files.""" - file_path = self._run_export() - try: - with default_storage.open(file_path, "rb") as f: - self.assertTrue(zipfile.is_zipfile(f)) - f.seek(0) - with zipfile.ZipFile(f, "r") as zf: - names = zf.namelist() - self.assertIn("event.txt", names) - self.assertIn("occurrence.txt", names) - self.assertIn("meta.xml", names) - self.assertIn("eml.xml", names) - finally: - default_storage.delete(file_path) + with self._open_zip() as f: + self.assertTrue(zipfile.is_zipfile(f)) + f.seek(0) + with zipfile.ZipFile(f, "r") as zf: + names = zf.namelist() + self.assertIn("event.txt", names) + self.assertIn("occurrence.txt", names) + self.assertIn("meta.xml", names) + self.assertIn("eml.xml", names) def test_event_headers_and_row_count(self): """event.txt should have correct headers and row count matching events.""" - file_path = self._run_export() - try: - with default_storage.open(file_path, "rb") as f: - with zipfile.ZipFile(f, "r") as zf: - event_data = zf.read("event.txt").decode("utf-8") - reader = csv.DictReader(StringIO(event_data), delimiter="\t") - rows = list(reader) - - # Check headers - self.assertIn("eventID", reader.fieldnames) - self.assertIn("eventDate", reader.fieldnames) - self.assertIn("decimalLatitude", reader.fieldnames) - self.assertIn("samplingProtocol", reader.fieldnames) - - # Row count should match project events - expected_count = self.project.events.count() - self.assertEqual(len(rows), expected_count, "Event row count mismatch") - finally: - default_storage.delete(file_path) + with self._open_zip() as f: + with zipfile.ZipFile(f, "r") as zf: + event_data = zf.read("event.txt").decode("utf-8") + reader = csv.DictReader(StringIO(event_data), delimiter="\t") + rows = list(reader) + + # Check headers + self.assertIn("eventID", reader.fieldnames) + self.assertIn("eventDate", reader.fieldnames) + self.assertIn("decimalLatitude", reader.fieldnames) + self.assertIn("samplingProtocol", reader.fieldnames) + + # Row count should match project events + expected_count = self.project.events.count() + self.assertEqual(len(rows), expected_count, "Event row count mismatch") def test_occurrence_headers_and_row_count(self): """occurrence.txt should have correct headers and row count matching valid occurrences.""" - file_path = self._run_export() - try: - with default_storage.open(file_path, "rb") as f: - with zipfile.ZipFile(f, "r") as zf: - occ_data = zf.read("occurrence.txt").decode("utf-8") - reader = csv.DictReader(StringIO(occ_data), delimiter="\t") - rows = list(reader) - - # Check headers - self.assertIn("occurrenceID", reader.fieldnames) - self.assertIn("scientificName", reader.fieldnames) - self.assertIn("basisOfRecord", reader.fieldnames) - self.assertIn("taxonRank", reader.fieldnames) - - # Row count should match valid occurrences with event and determination - expected_count = ( - Occurrence.objects.valid() # type: ignore[union-attr] - .filter(project=self.project, event__isnull=False, determination__isnull=False) - .count() - ) - self.assertEqual(len(rows), expected_count, "Occurrence row count mismatch") - - # All rows should have basisOfRecord = MachineObservation - for row in rows: - self.assertEqual(row["basisOfRecord"], "MachineObservation") - finally: - default_storage.delete(file_path) + with self._open_zip() as f: + with zipfile.ZipFile(f, "r") as zf: + occ_data = zf.read("occurrence.txt").decode("utf-8") + reader = csv.DictReader(StringIO(occ_data), delimiter="\t") + rows = list(reader) + + # Check headers + self.assertIn("occurrenceID", reader.fieldnames) + self.assertIn("scientificName", reader.fieldnames) + self.assertIn("basisOfRecord", reader.fieldnames) + self.assertIn("taxonRank", reader.fieldnames) + + # Row count should match valid occurrences with event and determination + expected_count = ( + Occurrence.objects.valid() # type: ignore[union-attr] + .filter(project=self.project, event__isnull=False, determination__isnull=False) + .count() + ) + self.assertEqual(len(rows), expected_count, "Occurrence row count mismatch") + + # All rows should have basisOfRecord = MachineObservation + for row in rows: + self.assertEqual(row["basisOfRecord"], "MachineObservation") def test_meta_xml_structure(self): """meta.xml should be valid XML with correct core/extension structure.""" - file_path = self._run_export() - try: - with default_storage.open(file_path, "rb") as f: - with zipfile.ZipFile(f, "r") as zf: - meta_xml = zf.read("meta.xml").decode("utf-8") - root = ET.fromstring(meta_xml) - - # Default namespace - ns = "http://rs.tdwg.org/dwc/text/" - - # Should have a core element with Event rowType - core = root.find(f"{{{ns}}}core") - self.assertIsNotNone(core, "meta.xml missing element") - self.assertIn("Event", core.get("rowType", "")) - - # Should have an extension element with Occurrence rowType - ext = root.find(f"{{{ns}}}extension") - self.assertIsNotNone(ext, "meta.xml missing element") - self.assertIn("Occurrence", ext.get("rowType", "")) - - # Core should reference event.txt - core_location = core.find(f".//{{{ns}}}location") - self.assertIsNotNone(core_location, "meta.xml core missing ") - self.assertEqual(core_location.text, "event.txt") - - # Extension should reference occurrence.txt - ext_location = ext.find(f".//{{{ns}}}location") - self.assertIsNotNone(ext_location, "meta.xml extension missing ") - self.assertEqual(ext_location.text, "occurrence.txt") - finally: - default_storage.delete(file_path) + with self._open_zip() as f: + with zipfile.ZipFile(f, "r") as zf: + meta_xml = zf.read("meta.xml").decode("utf-8") + root = ET.fromstring(meta_xml) + + # Default namespace + ns = "http://rs.tdwg.org/dwc/text/" + + # Should have a core element with Event rowType + core = root.find(f"{{{ns}}}core") + self.assertIsNotNone(core, "meta.xml missing element") + self.assertIn("Event", core.get("rowType", "")) + + # Should have an extension element with Occurrence rowType + ext = root.find(f"{{{ns}}}extension") + self.assertIsNotNone(ext, "meta.xml missing element") + self.assertIn("Occurrence", ext.get("rowType", "")) + + # Core should reference event.txt + core_location = core.find(f".//{{{ns}}}location") + self.assertIsNotNone(core_location, "meta.xml core missing ") + self.assertEqual(core_location.text, "event.txt") + + # Extension should reference occurrence.txt + ext_location = ext.find(f".//{{{ns}}}location") + self.assertIsNotNone(ext_location, "meta.xml extension missing ") + self.assertEqual(ext_location.text, "occurrence.txt") def test_referential_integrity(self): """All occurrence eventIDs should reference existing event eventIDs.""" - file_path = self._run_export() - try: - with default_storage.open(file_path, "rb") as f: - with zipfile.ZipFile(f, "r") as zf: - # Read event IDs - event_data = zf.read("event.txt").decode("utf-8") - event_reader = csv.DictReader(StringIO(event_data), delimiter="\t") - event_ids = {row["eventID"] for row in event_reader} - - # Read occurrence eventIDs - occ_data = zf.read("occurrence.txt").decode("utf-8") - occ_reader = csv.DictReader(StringIO(occ_data), delimiter="\t") - occ_event_ids = {row["eventID"] for row in occ_reader if row["eventID"]} - - # All occurrence eventIDs should exist in events - orphaned = occ_event_ids - event_ids - self.assertEqual( - len(orphaned), - 0, - f"Orphaned occurrence eventIDs (not in events): {orphaned}", - ) - finally: - default_storage.delete(file_path) + with self._open_zip() as f: + with zipfile.ZipFile(f, "r") as zf: + # Read event IDs + event_data = zf.read("event.txt").decode("utf-8") + event_reader = csv.DictReader(StringIO(event_data), delimiter="\t") + event_ids = {row["eventID"] for row in event_reader} + + # Read occurrence eventIDs + occ_data = zf.read("occurrence.txt").decode("utf-8") + occ_reader = csv.DictReader(StringIO(occ_data), delimiter="\t") + occ_event_ids = {row["eventID"] for row in occ_reader if row["eventID"]} + + # All occurrence eventIDs should exist in events + orphaned = occ_event_ids - event_ids + self.assertEqual( + len(orphaned), + 0, + f"Orphaned occurrence eventIDs (not in events): {orphaned}", + ) def test_taxonomy_hierarchy_extraction(self): """Taxonomy fields should be extracted from parents_json.""" @@ -517,21 +509,17 @@ def test_specific_epithet_extraction(self): def test_eml_xml_valid(self): """eml.xml should be valid XML with project metadata.""" - file_path = self._run_export() - try: - with default_storage.open(file_path, "rb") as f: - with zipfile.ZipFile(f, "r") as zf: - eml_xml = zf.read("eml.xml").decode("utf-8") - root = ET.fromstring(eml_xml) - - # Should have a dataset element - ns = {"eml": "eml://ecoinformatics.org/eml-2.1.1"} - dataset = root.find("eml:dataset", ns) or root.find("dataset") - self.assertIsNotNone(dataset, "eml.xml missing element") - - # Title should match project name - title = dataset.find("eml:title", ns) or dataset.find("title") - self.assertIsNotNone(title) - self.assertEqual(title.text, self.project.name) - finally: - default_storage.delete(file_path) + with self._open_zip() as f: + with zipfile.ZipFile(f, "r") as zf: + eml_xml = zf.read("eml.xml").decode("utf-8") + root = ET.fromstring(eml_xml) + + # Should have a dataset element + ns = {"eml": "eml://ecoinformatics.org/eml-2.1.1"} + dataset = root.find("eml:dataset", ns) or root.find("dataset") + self.assertIsNotNone(dataset, "eml.xml missing element") + + # Title should match project name + title = dataset.find("eml:title", ns) or dataset.find("title") + self.assertIsNotNone(title) + self.assertEqual(title.text, self.project.name) From c43d4069b0882e1f15b3fead0d840fc0c62b56ed Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 15:58:44 -0800 Subject: [PATCH 12/15] fix(exports): enable filter backends and derive events from filtered occurrences in DwC-A Remove get_filter_backends() override that returned [], allowing DwCAExporter to inherit BaseExporter's OccurrenceCollectionFilter. Update get_events_queryset() to derive events from self.queryset instead of fetching all project events, preventing orphaned events when collection_id filtering is active. Co-Authored-By: Claude --- ami/exports/format_types.py | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/ami/exports/format_types.py b/ami/exports/format_types.py index 24613c194..eb3ab8fef 100644 --- a/ami/exports/format_types.py +++ b/ami/exports/format_types.py @@ -211,15 +211,15 @@ def get_queryset(self): def get_events_queryset(self): from ami.main.models import Event - return Event.objects.filter(project=self.project).select_related( + event_ids = self.queryset.values_list("event_id", flat=True).distinct() + return Event.objects.filter( + project=self.project, + id__in=event_ids, + ).select_related( "deployment", "project", ) - def get_filter_backends(self): - # DwC-A exports events + occurrences; the collection-based filter doesn't apply - return [] - def export(self): """Export project data as a Darwin Core Archive ZIP.""" from django.utils.text import slugify From d11976e79ff2d4876c7927207b21c0ee02dff0ca Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 15:58:49 -0800 Subject: [PATCH 13/15] test(exports): add DwC-A collection filter test and fix event count assertion Add test_dwca_export_with_collection_filter that verifies filtered exports produce correct occurrence/event counts and referential integrity. Update test_event_headers_and_row_count to expect events derived from occurrences rather than all project events. Co-Authored-By: Claude --- ami/exports/tests.py | 90 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 88 insertions(+), 2 deletions(-) diff --git a/ami/exports/tests.py b/ami/exports/tests.py index c18b2cd95..3a1e6ca2b 100644 --- a/ami/exports/tests.py +++ b/ami/exports/tests.py @@ -388,8 +388,14 @@ def test_event_headers_and_row_count(self): self.assertIn("decimalLatitude", reader.fieldnames) self.assertIn("samplingProtocol", reader.fieldnames) - # Row count should match project events - expected_count = self.project.events.count() + # Row count should match events referenced by valid occurrences + expected_count = ( + Occurrence.objects.valid() # type: ignore[union-attr] + .filter(project=self.project, event__isnull=False, determination__isnull=False) + .values("event_id") + .distinct() + .count() + ) self.assertEqual(len(rows), expected_count, "Event row count mismatch") def test_occurrence_headers_and_row_count(self): @@ -523,3 +529,83 @@ def test_eml_xml_valid(self): title = dataset.find("eml:title", ns) or dataset.find("title") self.assertIsNotNone(title) self.assertEqual(title.text, self.project.name) + + def test_dwca_export_with_collection_filter(self): + """DwC-A export with collection_id filter should only include matching occurrences and their events.""" + # Create a collection with a subset of images + images = self.project.captures.all() + collection_images = images[: images.count() // 2] + self.assertGreater(len(collection_images), 0) + + collection = SourceImageCollection.objects.create( + name="DwCA Filter Test Collection", + project=self.project, + method="manual", + kwargs={"image_ids": [img.pk for img in collection_images]}, + ) + collection.populate_sample() + + # Run filtered export + data_export = DataExport.objects.create( + user=self.user, + project=self.project, + format="dwca", + filters={"collection_id": collection.pk}, + job=None, + ) + file_url = data_export.run_export() + self.assertIsNotNone(file_url) + + from django.conf import settings + + file_path = file_url.replace(settings.MEDIA_URL, "") + self.assertTrue(default_storage.exists(file_path)) + + try: + # Count expected filtered occurrences + expected_occ_count = ( + Occurrence.objects.valid() # type: ignore[union-attr] + .filter( + project=self.project, + event__isnull=False, + determination__isnull=False, + detections__source_image__collections=collection, + ) + .distinct() + .count() + ) + total_occ_count = ( + Occurrence.objects.valid() # type: ignore[union-attr] + .filter(project=self.project, event__isnull=False, determination__isnull=False) + .count() + ) + self.assertGreater(expected_occ_count, 0, "Filtered occurrences should not be empty") + self.assertLess(expected_occ_count, total_occ_count, "Filtered should be fewer than total") + + with default_storage.open(file_path, "rb") as f: + with zipfile.ZipFile(f, "r") as zf: + # Verify occurrence count + occ_data = zf.read("occurrence.txt").decode("utf-8") + occ_reader = csv.DictReader(StringIO(occ_data), delimiter="\t") + occ_rows = list(occ_reader) + self.assertEqual(len(occ_rows), expected_occ_count, "Filtered occurrence count mismatch") + + # Verify event count matches only events from filtered occurrences + event_data = zf.read("event.txt").decode("utf-8") + event_reader = csv.DictReader(StringIO(event_data), delimiter="\t") + event_rows = list(event_reader) + event_ids_in_file = {row["eventID"] for row in event_rows} + + # Events should only be those referenced by filtered occurrences + occ_event_ids = {row["eventID"] for row in occ_rows if row["eventID"]} + self.assertEqual( + event_ids_in_file, + occ_event_ids, + "Event IDs should match exactly those referenced by filtered occurrences", + ) + + # Referential integrity: no orphaned eventIDs in occurrences + orphaned = occ_event_ids - event_ids_in_file + self.assertEqual(len(orphaned), 0, f"Orphaned occurrence eventIDs: {orphaned}") + finally: + default_storage.delete(file_path) From e14139d77029d89caa6d8e785ccdbcdced75391f Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 17:50:19 -0800 Subject: [PATCH 14/15] docs(exports): add API and operations reference for export system Co-Authored-By: Claude --- docs/claude/export-system.md | 112 +++++++++++++++++++++++++++++++++++ 1 file changed, 112 insertions(+) create mode 100644 docs/claude/export-system.md diff --git a/docs/claude/export-system.md b/docs/claude/export-system.md new file mode 100644 index 000000000..09db213c3 --- /dev/null +++ b/docs/claude/export-system.md @@ -0,0 +1,112 @@ +# Export System — API & Operations Reference + +See also: `docs/claude/export-framework.md` for internal architecture and adding new formats. + +## API Endpoint + +`/api/v2/exports/` — `ExportViewSet` (`ami/exports/views.py:13`) + +### Methods + +| Method | Endpoint | Description | +|--------|----------|-------------| +| POST | `/api/v2/exports/` | Create export, enqueue async Celery job | +| GET | `/api/v2/exports/` | List exports (scoped to active project via `ProjectMixin`) | +| GET | `/api/v2/exports/{id}/` | Retrieve single export (job progress, file URL, record count) | +| PUT/PATCH | `/api/v2/exports/{id}/` | Update export (admin-only) | +| DELETE | `/api/v2/exports/{id}/` | Delete export and its file from storage | + +Permissions: `ObjectPermission` (`ami/base/permissions.py`). Researcher role can create and delete. Admin can update. Basic members and non-members cannot create. + +### Creating an Export (POST) + +**Required fields:** +- `project` (int) — Project PK +- `format` (string) — One of: `"occurrences_simple_csv"`, `"occurrences_api_json"`, `"dwca"` + +**Optional fields:** +- `filters` (object) — Filter criteria applied to occurrences + - `collection_id` (int) — Restrict to occurrences whose detections link to images in this `SourceImageCollection` + +**Validation** (`views.py:30-86`): +1. Format checked against `ExportRegistry.get_supported_formats()` +2. If `collection_id` provided, validates existence and project ownership +3. Object-level permission check on unsaved instance before persisting +4. Creates `DataExportJob` and enqueues via Celery + +**Response:** 201 with serialized `DataExport` including nested `job` object. + +### Response Fields + +Defined in `DataExportSerializer` (`ami/exports/serializers.py:30`): + +``` +id, user, project, format, filters, filters_display, +job {id, name, project, progress, result}, +file_url, record_count, file_size, file_size_display, +created_at, updated_at +``` + +- `file_url` — null until export completes, then absolute URL to file +- `file_size_display` — human-readable (e.g. "2.4 MB") +- `filters_display` — auto-populated with human names (e.g. collection name) +- `job.progress` — tracks export stages with percentage + +### Polling for Completion + +Exports run asynchronously. Poll `GET /api/v2/exports/{id}/` and check: +- `job.progress` for stage updates +- `file_url` becomes non-null when export is ready for download + +## Registered Formats + +Registered in `ami/exports/registry.py:28-30`, implemented in `ami/exports/format_types.py`: + +| Key | Class | Output | Description | +|-----|-------|--------|-------------| +| `occurrences_simple_csv` | `CSVExporter` (:149) | `.csv` | Tabular occurrence data with detection fields | +| `occurrences_api_json` | `JSONExporter` (:39) | `.json` | Full API serialization of occurrences | +| `dwca` | `DwCAExporter` (:192) | `.zip` | Darwin Core Archive with event.txt + occurrence.txt + meta.xml + eml.xml | + +## Filter System + +All exporters inherit `OccurrenceCollectionFilter` from `BaseExporter.get_filter_backends()` (`base.py:42-45`). + +**OccurrenceCollectionFilter** (`ami/main/api/views.py:981-998`): +- Accepts `collection_id` or `collection` query param +- Filters: `queryset.filter(detections__source_image__collections=collection_id).distinct()` +- No-op when param is absent — unfiltered exports work unchanged + +**How filters are applied in Celery context** (`ami/exports/utils.py`): +- `generate_fake_request()` creates a mock DRF Request with filter values as query params +- `apply_filters()` runs each filter backend's `filter_queryset()` against the exporter's queryset +- Called in `BaseExporter.__init__()` so `self.queryset` is already filtered before `export()` runs + +## DwC-A Specifics + +The DwC-A exporter produces two data files linked by `eventID`: + +- **event.txt** — Events derived from filtered occurrences (`get_events_queryset()` at `format_types.py:211`) +- **occurrence.txt** — Filtered occurrences with Darwin Core terms + +Events are not fetched independently — they're derived from `self.queryset.values_list("event_id").distinct()` to maintain referential integrity when filters are active. + +Field definitions: `ami/exports/dwca.py` — `EVENT_FIELDS` (:26), `OCCURRENCE_FIELDS` (:57). +See `docs/claude/dwca-format-reference.md` for Darwin Core term mappings. + +## Job Integration + +`DataExportJob` (`ami/jobs/models.py:682-716`): +1. Adds "Exporting data" progress stage +2. Calls `job.data_export.run_export()` +3. Adds "Uploading snapshot" stage with file URL +4. Finalizes job as SUCCESS + +`DataExport` has a OneToOne relation to `Job` via `job.data_export` (`models.py:841`). + +## File Lifecycle + +1. Exporter writes to temp file during `export()` +2. `DataExport.save_export_file()` uploads to `exports/` in default_storage (S3/MinIO) +3. `file_url` saved on model +4. On `DataExport` deletion: `pre_delete` signal (`ami/exports/signals.py:13`) removes file from storage From c8aadb768db7f130da0de71934489dcd60d3095d Mon Sep 17 00:00:00 2001 From: Michael Bunsen Date: Wed, 11 Feb 2026 17:51:34 -0800 Subject: [PATCH 15/15] docs(exports): merge API reference into export-framework.md Add API methods, request/response format, filter system, DwC-A specifics, job integration, and file lifecycle details. Remove separate export-system.md. Co-Authored-By: Claude --- docs/claude/export-framework.md | 123 +++++++++++++++++++++++++++++--- docs/claude/export-system.md | 112 ----------------------------- 2 files changed, 113 insertions(+), 122 deletions(-) delete mode 100644 docs/claude/export-system.md diff --git a/docs/claude/export-framework.md b/docs/claude/export-framework.md index 716a7abce..c584e56a7 100644 --- a/docs/claude/export-framework.md +++ b/docs/claude/export-framework.md @@ -13,9 +13,10 @@ The export system uses a registry pattern where format-specific exporters regist | `ami/exports/format_types.py` | Concrete exporters: `JSONExporter`, `CSVExporter`, `DwCAExporter` | | `ami/exports/models.py` | `DataExport` model - tracks export jobs, files, stats | | `ami/exports/utils.py` | `apply_filters()`, `get_data_in_batches()`, `generate_fake_request()` | -| `ami/exports/views.py` | `DataExportViewSet` - API endpoint for creating/listing exports | +| `ami/exports/views.py` | `ExportViewSet` - API endpoint for creating/listing exports | | `ami/exports/serializers.py` | `DataExportSerializer` - validates format, filters | | `ami/exports/signals.py` | Deletes exported file when `DataExport` is deleted | +| `ami/exports/dwca.py` | DwC-A field definitions, XML generators, TSV writer | ### Flow @@ -31,6 +32,74 @@ The export system uses a registry pattern where format-specific exporters regist 9. file_url saved to DataExport model ``` +## API Endpoint + +`/api/v2/exports/` — `ExportViewSet` (`ami/exports/views.py:13`) + +### Methods + +| Method | Endpoint | Description | +|--------|----------|-------------| +| POST | `/api/v2/exports/` | Create export, enqueue async Celery job | +| GET | `/api/v2/exports/` | List exports (scoped to active project via `ProjectMixin`) | +| GET | `/api/v2/exports/{id}/` | Retrieve single export (job progress, file URL, record count) | +| PUT/PATCH | `/api/v2/exports/{id}/` | Update export (admin-only) | +| DELETE | `/api/v2/exports/{id}/` | Delete export and its file from storage | + +Permissions: `ObjectPermission` (`ami/base/permissions.py`). Researcher role can create and delete. Admin can update. Basic members and non-members cannot create. + +### Creating an Export (POST) + +**Required fields:** +- `project` (int) — Project PK +- `format` (string) — One of: `"occurrences_simple_csv"`, `"occurrences_api_json"`, `"dwca"` + +**Optional fields:** +- `filters` (object) — Filter criteria applied to occurrences + - `collection_id` (int) — Restrict to occurrences whose detections link to images in this `SourceImageCollection` + +**Validation** (`views.py:30-86`): +1. Format checked against `ExportRegistry.get_supported_formats()` +2. If `collection_id` provided, validates existence and project ownership +3. Object-level permission check on unsaved instance before persisting +4. Creates `DataExportJob` and enqueues via Celery + +**Response:** 201 with serialized `DataExport` including nested `job` object. + +### Response Fields + +Defined in `DataExportSerializer` (`ami/exports/serializers.py:30`): + +``` +id, user, project, format, filters, filters_display, +job {id, name, project, progress, result}, +file_url, record_count, file_size, file_size_display, +created_at, updated_at +``` + +- `file_url` — null until export completes, then absolute URL to file +- `file_size_display` — human-readable (e.g. "2.4 MB") +- `filters_display` — auto-populated with human names (e.g. collection name) +- `job.progress` — tracks export stages with percentage + +### Polling for Completion + +Exports run asynchronously. Poll `GET /api/v2/exports/{id}/` and check: +- `job.progress` for stage updates +- `file_url` becomes non-null when export is ready for download + +## Registered Formats + +Registered in `ami/exports/registry.py:28-30`, implemented in `ami/exports/format_types.py`: + +| Key | Class | Output | Description | +|-----|-------|--------|-------------| +| `occurrences_simple_csv` | `CSVExporter` (:149) | `.csv` | Tabular occurrence data with detection fields | +| `occurrences_api_json` | `JSONExporter` (:39) | `.json` | Full API serialization of occurrences | +| `dwca` | `DwCAExporter` (:192) | `.zip` | Darwin Core Archive with event.txt + occurrence.txt + meta.xml + eml.xml | + +## Internals + ### BaseExporter (ami/exports/base.py) ```python @@ -88,6 +157,49 @@ Key methods: - `generate_filename()` - `{project_slug}_export-{pk}.{ext}` - `get_exporter()` - cached exporter instance +### Filter System + +All exporters inherit `OccurrenceCollectionFilter` from `BaseExporter.get_filter_backends()` (`base.py:42-45`). + +**OccurrenceCollectionFilter** (`ami/main/api/views.py:981-998`): +- Accepts `collection_id` or `collection` query param +- Filters: `queryset.filter(detections__source_image__collections=collection_id).distinct()` +- No-op when param is absent — unfiltered exports work unchanged + +**How filters are applied in Celery context** (`ami/exports/utils.py`): +- `generate_fake_request()` creates a mock DRF Request with filter values as query params +- `apply_filters()` runs each filter backend's `filter_queryset()` against the exporter's queryset +- Called in `BaseExporter.__init__()` so `self.queryset` is already filtered before `export()` runs + +### DwC-A Specifics + +The DwC-A exporter produces two data files linked by `eventID`: + +- **event.txt** — Events derived from filtered occurrences (`get_events_queryset()` at `format_types.py:211`) +- **occurrence.txt** — Filtered occurrences with Darwin Core terms + +Events are not fetched independently — they're derived from `self.queryset.values_list("event_id").distinct()` to maintain referential integrity when filters are active. + +Field definitions: `ami/exports/dwca.py` — `EVENT_FIELDS` (:26), `OCCURRENCE_FIELDS` (:57). +See `docs/claude/dwca-format-reference.md` for Darwin Core term mappings. + +### Job Integration + +`DataExportJob` (`ami/jobs/models.py:682-716`): +1. Adds "Exporting data" progress stage +2. Calls `job.data_export.run_export()` +3. Adds "Uploading snapshot" stage with file URL +4. Finalizes job as SUCCESS + +`DataExport` has a OneToOne relation to `Job` via `job.data_export` (`models.py:841`). + +### File Lifecycle + +1. Exporter writes to temp file during `export()` +2. `DataExport.save_export_file()` uploads to `exports/` in default_storage (S3/MinIO) +3. `file_url` saved on model +4. On `DataExport` deletion: `pre_delete` signal (`ami/exports/signals.py:13`) removes file from storage + ### Adding a New Export Format 1. Create exporter class extending `BaseExporter` @@ -101,12 +213,3 @@ Key methods: - `generate_fake_request()` - creates a DRF Request for serializer context (needed because exports run in Celery, not in HTTP request context) - `apply_filters(queryset, filters, filter_backends)` - applies DRF filter backends using fake request with filter query params - `get_data_in_batches(queryset, serializer_class, batch_size=100)` - yields batches of serialized data using queryset.iterator() - -### Important Notes - -- Exports run as Celery tasks, so no real HTTP request is available -- The `generate_fake_request()` utility creates a mock DRF request for serializer context (needed for HyperlinkedModelSerializer URLs) -- Filters are passed as query params on the fake request -- Default filter backend is `OccurrenceCollectionFilter` (filters by collection_id) -- The export file is written to a temp file, then uploaded to default_storage (S3/MinIO) -- On DataExport deletion, the signal handler deletes the file from storage diff --git a/docs/claude/export-system.md b/docs/claude/export-system.md deleted file mode 100644 index 09db213c3..000000000 --- a/docs/claude/export-system.md +++ /dev/null @@ -1,112 +0,0 @@ -# Export System — API & Operations Reference - -See also: `docs/claude/export-framework.md` for internal architecture and adding new formats. - -## API Endpoint - -`/api/v2/exports/` — `ExportViewSet` (`ami/exports/views.py:13`) - -### Methods - -| Method | Endpoint | Description | -|--------|----------|-------------| -| POST | `/api/v2/exports/` | Create export, enqueue async Celery job | -| GET | `/api/v2/exports/` | List exports (scoped to active project via `ProjectMixin`) | -| GET | `/api/v2/exports/{id}/` | Retrieve single export (job progress, file URL, record count) | -| PUT/PATCH | `/api/v2/exports/{id}/` | Update export (admin-only) | -| DELETE | `/api/v2/exports/{id}/` | Delete export and its file from storage | - -Permissions: `ObjectPermission` (`ami/base/permissions.py`). Researcher role can create and delete. Admin can update. Basic members and non-members cannot create. - -### Creating an Export (POST) - -**Required fields:** -- `project` (int) — Project PK -- `format` (string) — One of: `"occurrences_simple_csv"`, `"occurrences_api_json"`, `"dwca"` - -**Optional fields:** -- `filters` (object) — Filter criteria applied to occurrences - - `collection_id` (int) — Restrict to occurrences whose detections link to images in this `SourceImageCollection` - -**Validation** (`views.py:30-86`): -1. Format checked against `ExportRegistry.get_supported_formats()` -2. If `collection_id` provided, validates existence and project ownership -3. Object-level permission check on unsaved instance before persisting -4. Creates `DataExportJob` and enqueues via Celery - -**Response:** 201 with serialized `DataExport` including nested `job` object. - -### Response Fields - -Defined in `DataExportSerializer` (`ami/exports/serializers.py:30`): - -``` -id, user, project, format, filters, filters_display, -job {id, name, project, progress, result}, -file_url, record_count, file_size, file_size_display, -created_at, updated_at -``` - -- `file_url` — null until export completes, then absolute URL to file -- `file_size_display` — human-readable (e.g. "2.4 MB") -- `filters_display` — auto-populated with human names (e.g. collection name) -- `job.progress` — tracks export stages with percentage - -### Polling for Completion - -Exports run asynchronously. Poll `GET /api/v2/exports/{id}/` and check: -- `job.progress` for stage updates -- `file_url` becomes non-null when export is ready for download - -## Registered Formats - -Registered in `ami/exports/registry.py:28-30`, implemented in `ami/exports/format_types.py`: - -| Key | Class | Output | Description | -|-----|-------|--------|-------------| -| `occurrences_simple_csv` | `CSVExporter` (:149) | `.csv` | Tabular occurrence data with detection fields | -| `occurrences_api_json` | `JSONExporter` (:39) | `.json` | Full API serialization of occurrences | -| `dwca` | `DwCAExporter` (:192) | `.zip` | Darwin Core Archive with event.txt + occurrence.txt + meta.xml + eml.xml | - -## Filter System - -All exporters inherit `OccurrenceCollectionFilter` from `BaseExporter.get_filter_backends()` (`base.py:42-45`). - -**OccurrenceCollectionFilter** (`ami/main/api/views.py:981-998`): -- Accepts `collection_id` or `collection` query param -- Filters: `queryset.filter(detections__source_image__collections=collection_id).distinct()` -- No-op when param is absent — unfiltered exports work unchanged - -**How filters are applied in Celery context** (`ami/exports/utils.py`): -- `generate_fake_request()` creates a mock DRF Request with filter values as query params -- `apply_filters()` runs each filter backend's `filter_queryset()` against the exporter's queryset -- Called in `BaseExporter.__init__()` so `self.queryset` is already filtered before `export()` runs - -## DwC-A Specifics - -The DwC-A exporter produces two data files linked by `eventID`: - -- **event.txt** — Events derived from filtered occurrences (`get_events_queryset()` at `format_types.py:211`) -- **occurrence.txt** — Filtered occurrences with Darwin Core terms - -Events are not fetched independently — they're derived from `self.queryset.values_list("event_id").distinct()` to maintain referential integrity when filters are active. - -Field definitions: `ami/exports/dwca.py` — `EVENT_FIELDS` (:26), `OCCURRENCE_FIELDS` (:57). -See `docs/claude/dwca-format-reference.md` for Darwin Core term mappings. - -## Job Integration - -`DataExportJob` (`ami/jobs/models.py:682-716`): -1. Adds "Exporting data" progress stage -2. Calls `job.data_export.run_export()` -3. Adds "Uploading snapshot" stage with file URL -4. Finalizes job as SUCCESS - -`DataExport` has a OneToOne relation to `Job` via `job.data_export` (`models.py:841`). - -## File Lifecycle - -1. Exporter writes to temp file during `export()` -2. `DataExport.save_export_file()` uploads to `exports/` in default_storage (S3/MinIO) -3. `file_url` saved on model -4. On `DataExport` deletion: `pre_delete` signal (`ami/exports/signals.py:13`) removes file from storage