Add per-asset dataStandard, HED standard, and extensions to StandardsType#371
Add per-asset dataStandard, HED standard, and extensions to StandardsType#371yarikoptic wants to merge 3 commits intomasterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #371 +/- ##
==========================================
+ Coverage 97.92% 97.94% +0.02%
==========================================
Files 18 18
Lines 2405 2431 +26
==========================================
+ Hits 2355 2381 +26
Misses 50 50
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| dataStandard: Optional[List[StandardsType]] = Field( | ||
| None, | ||
| description="Data standard(s) applicable to this asset.", | ||
| json_schema_extra={"readOnly": True}, |
There was a problem hiding this comment.
readOnly should set to true only in the case that the dataStandard field is assigned and managed solely by the server per https://json-schema.org/draft/2020-12/draft-bhutton-json-schema-validation-00#rfc.section.9.4. Is it the case that the dataStandard field is intended to be assigned and managed solely by the server?
There was a problem hiding this comment.
I would guess the answer is no -- that it is intended to be extracted from the source files by the DANDI CLI running client-side.
There was a problem hiding this comment.
INTERESTING! We also have readOnly on above "TODO" attributes which I guess nobody ever assigned to any asset. Although we are indeed to populate it from client (or on server) but since we have other similar wasGeneratedBy also editable, I see no point in making this readOnly, and I guess it is really our own thing, hence
| json_schema_extra={"readOnly": True}, | |
| json_schema_extra={"nskey": DANDI_NSKEY}, |
right?
There was a problem hiding this comment.
Good point — this is populated client-side by dandi-cli, not server-managed, so readOnly is incorrect. Replaced with "nskey": DANDI_NSKEY in 702f464 RF: address PR review - replace readOnly with nskey, fix extensions field.
|
@yarikoptic this looks great! It will be exciting to be able to query the archive for NWB version and extension usage for data going forward. I agree with @candleindark that this probably should not be readOnly. Other than that, looks good. It would be great to also see a draft of dandi-cli that extracts this information. I'd prefer to have a draft of that before merging this, in case we are able to catch any edge cases. |
It is there waiting for you |
candleindark
left a comment
There was a problem hiding this comment.
Left some minor suggestions for change
| dataStandard: Optional[List[StandardsType]] = Field( | ||
| None, | ||
| description="Data standard(s) applicable to this asset.", | ||
| json_schema_extra={"readOnly": True}, |
There was a problem hiding this comment.
Shouldn't we have an "nskey" key in the json_schema_extra dict? If dataStandard is not defined in a known ontology, we I think we should use ours, i.e.,
json_schema_extra={"nskey": DANDI_NSKEY}
There was a problem hiding this comment.
Agreed — added "nskey": DANDI_NSKEY in 702f464 RF: address PR review - replace readOnly with nskey, fix extensions field.
There was a problem hiding this comment.
it should still have readOnly as this is a metadata element that should not be editable using the UI or API. it is also filled in on the server side.
There was a problem hiding this comment.
IIRC that readOnly only for web ui entry -- we are the ones to provide metadata records via API while uploading assets. So if we forbid that in API, nothing would fill that in AFAIK.
webUI does not ATM care about any asset level metadata entry, so all that readOnly is of no effect and somewhat inconsistent ATM anyways so I would not split hairs over it if someone pushes 'readOnly' there
There was a problem hiding this comment.
i'm pretty positive asset summary is filled by the server validation.
There was a problem hiding this comment.
sorry didn't realize this is asset level. i was thinking dandiset level. however, there are a set of fields that the server unsets when uploaded.
| description="Version of the standard used.", | ||
| json_schema_extra={"nskey": "schema"}, | ||
| ) | ||
| extensions: Optional[List["StandardsType"]] = Field( |
There was a problem hiding this comment.
| extensions: Optional[List["StandardsType"]] = Field( | |
| extensions: Optional[List[StandardsType]] = Field( |
With from __future__ import annotations at the beginning of the file, you don't need the type to be wrapped in a string.
There was a problem hiding this comment.
Right — removed the quotes since from __future__ import annotations handles it. Fixed in 702f464 RF: address PR review - replace readOnly with nskey, fix extensions field.
| extensions: Optional[List["StandardsType"]] = Field( | ||
| None, | ||
| description="Extensions to the standard used in this dataset " | ||
| "(e.g. NWB extensions like ndx-*, HED library schemas).", |
There was a problem hiding this comment.
Shouldn't we set "nskey" key in the json_schema_extra dict? If extensions is not defined in a known ontology, we I think we should use ours, i.e.,
json_schema_extra={"nskey": DANDI_NSKEY}
There was a problem hiding this comment.
Agreed — added "nskey": DANDI_NSKEY to extensions in 702f464 RF: address PR review - replace readOnly with nskey, fix extensions field.
| identifier="DOI:10.25504/FAIRsharing.9af712", | ||
| ).model_dump(mode="json", exclude_none=True) | ||
|
|
||
| hed_standard = StandardsType( |
There was a problem hiding this comment.
Defining hed_standard in lower case is consistent with the established pattern, as in ome_ngff_standard. However, all the standard variables, nwb_standard, bids_standard, ome_ngff_standard, and hed_standard should be specified in upper case laters since they function as constants.
I think that should be done in a separate PR though.
There was a problem hiding this comment.
Agreed — keeping lowercase here for consistency with the existing pattern, and the rename to UPPER_CASE can be done in a separate PR as you suggest.
|
I approve the dandi dli PR. Once we respond to @candleindark 's comments, this is good to go from my end |
…s on StandardsType - Bump DANDI_SCHEMA_VERSION to 0.8.0, add 0.7.0 to ALLOWED_INPUT_SCHEMAS - Add `dataStandard: Optional[List[StandardsType]]` to BareAsset for per-asset standard declarations (NWB, BIDS, HED, OME/NGFF) - Add `version: Optional[str]` to StandardsType - Add `extensions: Optional[List["StandardsType"]]` (self-referencing) to StandardsType for NWB ndx-* extensions, HED library schemas, etc. - Add `hed_standard` constant (RRID:SCR_014074) - Collect per-asset dataStandard in aggregate_assets_summary(), with deprecated path/encoding heuristic fallbacks (remove after 2026-12-01) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cklist Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ield Address @candleindark's review comments on PR #371: - BareAsset.dataStandard: replace readOnly=True with nskey=DANDI_NSKEY (field is populated client-side by dandi-cli, not server-managed) - StandardsType.extensions: add nskey=DANDI_NSKEY, fix description ("in this dataset" -> removed, since standard applies per-asset), and remove unnecessary string quotes on forward ref (from __future__ import annotations handles it) Co-Authored-By: Claude Code 2.1.104 / Claude Opus 4.6 <noreply@anthropic.com>
| @@ -880,6 +897,11 @@ class StandardsType(BaseType): | |||
| identifier="DOI:10.25504/FAIRsharing.9af712", | |||
| ).model_dump(mode="json", exclude_none=True) | |||
There was a problem hiding this comment.
@kabilar please add additional constructs for anything LINC related -- like those for bigtiff etc, and then here
as to use them in dandi-cli
Summary
dataStandard: Optional[List[StandardsType]]field toBareAssetso dandi-cli can declare per-asset data standards (NWB, BIDS, HED, OME/NGFF) that flow through toAssetsSummaryduring aggregationversion: Optional[str]field toStandardsTypeto track standard versions (e.g. NWB 2.7.0, BIDS 1.9.0, HED 8.2.0)extensions: Optional[List[StandardsType]]self-referencing field toStandardsTypefor standard extensions (NWB ndx-* extensions, HED library schemas)hed_standardconstant alongside existingnwb_standard,bids_standard,ome_ngff_standarddataStandardentries inaggregate_assets_summary(), with deprecated heuristic fallbacks for older clients (remove after 2026-12-01)dataStandardinmigrate()_FIELDS_INTRODUCEDfor downgrade pruningCLAUDE.mdwith architecture docs and schema change checklistTest plan
test_hed_standard_structure— verifies hed_standard has correct name, identifier, schemaKeytest_aggregate_per_asset_datastandard— HED declared per-asset flows to summarytest_aggregate_per_asset_datastandard_no_duplication— BIDS not duplicated when declared both per-asset and via heuristicPR in dandi-cli:
🤖 Generated with Claude Code