Skip to content

Add git-annex support for upload with auto-detection#1797

Draft
Copilot wants to merge 7 commits intomasterfrom
copilot/add-auto-datalad-support
Draft

Add git-annex support for upload with auto-detection#1797
Copilot wants to merge 7 commits intomasterfrom
copilot/add-auto-datalad-support

Conversation

Copy link
Contributor

Copilot AI commented Feb 13, 2026

Enables linking local uploaded files with their remote URLs via git-annex, and validates that annexed files use the SHA256E backend standard.

Changes

CLI

  • Added --git-annex [yes|no|auto] option (default: auto)
    • auto: detects git-annex repository (checks for git-annex branch and annex.uuid config) and enables if found (default)
    • yes: always registers URLs with git-annex post-upload
    • no: disables git-annex integration

Core Implementation

  • Post-upload hook polls for contentUrl (up to 20s with 2s intervals) from asset metadata and calls git annex addurl --relaxed --file <path> <url>
  • URL selection: prefers S3 URLs for public dandisets; falls back to first available URL
  • Backend validation: warns if annexed file backend ≠ SHA256E (dandiset standard)
  • Error handling: git-annex failures now raise with helpful hints to use --git-annex no
  • Detection utilities:
    • _is_git_annex_repo(): checks for git-annex branch and annex.uuid config
    • _get_file_annex_backend(): parses symlink target filename for backend key (validates annex/objects path structure)
    • _register_url_with_annex(): spawns git-annex subprocess

Example

# After upload completes, if --git-annex=auto (default) and git-annex repo detected:
# 1. Checks if file backend is SHA256E (warns if not)
# 2. Polls for contentUrl (up to 20s): ["https://s3.amazonaws.com/..."]
# 3. Selects S3 URL (preferred for public access)
# 4. Runs: git annex addurl --relaxed --file path/to/file.nwb https://s3...

Tests

  • Consolidated test test_git_annex_repo_ops covers enum, detection, and backend parsing
  • Tests skip gracefully when git-annex unavailable
Original prompt

This section details on the original issue you should resolve

<issue_title>add "auto" datalad support for upload</issue_title>
<issue_description>to link local uploaded files with remote URLs obtained from metadata record for the uploaded file.

then eventually it would be interesting

also to warn if backend differs from our dandiset backends -- which is SHA256E

came up in dialog with @CodyCBakerPhD </issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 5 commits February 13, 2026 20:17
- Add DataladMode enum with yes/no/auto options
- Add --datalad CLI option to upload command
- Add utility functions to detect git-annex repos and register URLs
- Add backend checking to warn if not SHA256E
- Integrate datalad URL registration after successful upload

Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
- Add test_datalad_mode_enum to verify enum values
- Add test_is_git_annex_repo for repository detection
- Add test_get_file_annex_backend for backend checking
- Register ai_generated marker in tox.ini
- Tests skip gracefully when git-annex is not available

Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
- Fix dandiset.path type error (use path_obj instead)
- Fix line length issue in debug message
- All mypy and flake8 checks pass

Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
- Improve _get_file_annex_backend to verify annex directory structure
- Add detailed comment about contentUrl array usage
- Use tuple for DataladMode choices in CLI
- Keep __str__ method as it's needed for proper string conversion

Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
- Change from list to tuple comprehension to avoid creating new list on each import
- More efficient and follows best practices

Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
Copilot AI changed the title [WIP] Add auto datalad support for upload Add datalad/git-annex support for upload with auto-detection Feb 13, 2026
Copilot AI requested a review from yarikoptic February 13, 2026 20:27
dandi/upload.py Outdated
Comment on lines 78 to 89
# Find git repository root
result = subprocess.run(
["git", "rev-parse", "--show-toplevel"],
cwd=path if path.is_dir() else path.parent,
capture_output=True,
text=True,
check=False,
)
if result.returncode == 0:
repo_root = Path(result.stdout.strip())
annex_dir = repo_root / ".git" / "annex"
return annex_dir.exists()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better check that there is git-annex branch and git config annex.uuid is defined. A folder could be left artifact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fbb8785. Now checking for git-annex branch and annex.uuid config to properly validate git-annex initialization.

show_default=True,
)
@click.option(
"--datalad",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we are not doing anything "datalad" specific and operate directly with git-annex, let's call it --git-annex (and GitAnnexMode) and do 'auto' by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fbb8785. Renamed to --git-annex with GitAnnexMode enum and changed default to auto.

"""Test detection of git-annex repositories."""
import subprocess

from ..upload import _is_git_annex_repo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports should be at the top of the module sorted (pre-commit)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fbb8785. Moved subprocess import to top of test module.

["git", "annex", "init"], cwd=tmp_path, check=True, capture_output=True
)

# Create a test file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fold the tests following after this line into test_is_git_annex_repo and rename it like test_git_annex_repo_ops

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fbb8785. Consolidated into single test_git_annex_repo_ops that tests enum, detection, and backend operations.

return None

# Find the part that looks like an annex key (contains backend and size)
for part in target_path.parts:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just take the filename

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fbb8785. Now using Path(target).name to extract just the filename.

dandi/upload.py Outdated
)

if result.returncode == 0:
lgr.info("Successfully registered %s with git-annex", filepath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lgr.info("Successfully registered %s with git-annex", filepath)
lgr.info("Successfully registered url %s to %s with git-annex", remote_url, filepath)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fbb8785. Updated log message to include both URL and filepath.

dandi/upload.py Outdated
return True
else:
lgr.warning(
"Failed to register %s with git-annex: %s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Failed to register %s with git-annex: %s",
"Failed to register url %s for %s with git-annex: %s", url,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fbb8785. Updated log message to include URL in the failure message.

except Exception as e:
lgr.warning(
"%s: Error during datalad integration: %s", strpath, e
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove swallow + warn -- let it raise. If doesn't work, user should disable use of git-annex. We might want to hint to use --git-annex no or whatever that option would be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fbb8785. Removed try/except wrapper - errors now raise with helpful hint message to use --git-annex no.

"%s: No contentUrl found in asset metadata; "
"skipping datalad registration",
strpath,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might take time to get URL registered in backend. So we would need to loop for e.g. up to 20 seconds, reasking each 2 and then failing if none is provided after 20 seconds, and exiting the loop if we successfully registered the URL.

note that there could be multiple URLs, review code in https://github.com/dandi/backups2datalad on dealing with public vs private / embargoed dandisets. For embargoed we need to ensure enabling datalad special remote and associate api based URLs to facilitate authentication. If public -- then public S3 urls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fbb8785. Added polling loop (20s timeout, 2s intervals) for contentUrl. Implemented S3 URL preference for public dandisets. Note: embargoed dandisets with datalad special remote authentication would need additional work - for now this handles the public case which is the primary use case.

tox.ini Outdated
[pytest]
addopts = --tb=short --durations=10 --timeout=300
markers =
ai_generated: marks tests as AI-generated (deselect with '-m "not ai_generated"')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop this change -- we have it defined already in

dandi/pytest_plugin.py:        "ai_generated",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in fbb8785. Removed from tox.ini as it's already defined in dandi/pytest_plugin.py.

@yarikoptic
Copy link
Member

also needs a unittest, may be in an extended run of one of CI runs where git-annex does get installed and we do try to upload a file from within git-annex repo (embargoed and not) and verify that URL gets assigned.

@codecov
Copy link

codecov bot commented Feb 13, 2026

Codecov Report

❌ Patch coverage is 22.72727% with 85 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.18%. Comparing base (ceeb898) to head (4eb69e5).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
dandi/upload.py 1.36% 72 Missing ⚠️
dandi/tests/test_upload.py 63.88% 13 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (ceeb898) and HEAD (4eb69e5). Click for more details.

HEAD has 20 uploads less than BASE
Flag BASE (ceeb898) HEAD (4eb69e5)
unittests 25 5
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #1797       +/-   ##
===========================================
- Coverage   75.10%   50.18%   -24.92%     
===========================================
  Files          84       84               
  Lines       11921    12030      +109     
===========================================
- Hits         8953     6037     -2916     
- Misses       2968     5993     +3025     
Flag Coverage Δ
unittests 50.18% <22.72%> (-24.92%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Rename DataladMode to GitAnnexMode and --datalad to --git-annex
- Change default from 'no' to 'auto'
- Improve git-annex repo detection (check for git-annex branch and annex.uuid)
- Use filename for backend extraction
- Improve log messages with URL details
- Add polling loop for contentUrl (20s timeout with 2s intervals)
- Select appropriate URL (prefer S3 for public access)
- Remove exception swallowing - let errors raise with helpful hints
- Consolidate tests into test_git_annex_repo_ops
- Move subprocess import to top of test module
- Remove ai_generated marker from tox.ini (already in pytest_plugin.py)

Co-authored-by: yarikoptic <39889+yarikoptic@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 13, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • rig.mit.edu
    • Triggering command: /home/REDACTED/.local/bin/dandi dandi upload --help (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI changed the title Add datalad/git-annex support for upload with auto-detection Add git-annex support for upload with auto-detection Feb 13, 2026
Copilot AI requested a review from yarikoptic February 13, 2026 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add "auto" datalad support for upload

2 participants