Skip to content

azure-ai-projects: replace hard-coded skip lists with .gitignore-based filtering in _upload_folder_to_blob#46037

Draft
Copilot wants to merge 2 commits intofeature/azure-ai-projects/2.0.2from
copilot/update-upload-folder-respect-gitignore
Draft

azure-ai-projects: replace hard-coded skip lists with .gitignore-based filtering in _upload_folder_to_blob#46037
Copilot wants to merge 2 commits intofeature/azure-ai-projects/2.0.2from
copilot/update-upload-folder-respect-gitignore

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 1, 2026

_upload_folder_to_blob used hard-coded skip_dirs / skip_extensions sets (__pycache__, .venv, *.pyc, etc.). Users had no way to control what gets uploaded without modifying SDK code.

Changes

  • pyproject.toml — add pathspec>=0.9.0 runtime dependency.
  • operations/_patch_evaluators.py — two new module-level helpers:
    • _load_gitignore_spec(root) — reads root/.gitignore and root/.git/info/exclude; returns a compiled pathspec.PathSpec. Returns empty spec (upload everything) when neither file exists.
    • _is_ignored(spec, root, path) — tests a path against the spec using its POSIX-relative form.
    • _upload_folder_to_blob now prunes dirs[:] and skips files using the spec. .git is unconditionally excluded as a safety guard regardless of .gitignore contents.
  • aio/operations/_patch_evaluators_async.py — imports and reuses the same helpers from the sync module; identical walk/filter logic with await.
  • Tests (sync + async)test_upload_skips_pycache_and_pyc_files updated to provide a .gitignore; four new tests added per variant covering: ignored directory not traversed, ignored file not uploaded, no .gitignore uploads all files, .git always excluded.

Note on .pyc/__pycache__ exclusion: these are no longer excluded by default. Users should add them to their .gitignore (as GitHub's default Python template does). The .git directory remains a hard exclusion.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/graphql
    • Triggering command: /usr/bin/gh gh auth status bash --no�� s/tests/evaluators/test_evaluators_upload_async.py (http block)
    • Triggering command: /usr/bin/gh gh pr create --title Use .gitignore-based filtering in _upload_folder_to_blob` for azure-ai-projects evaluators --body ## Summary

Replaces the hard-coded skip_dirs / skip_extensions lists in _upload_folder_to_blob with dynamic .gitignore-based filtering using the pathspec library (gitwildmatch dialect).

Changes

sdk/ai/azure-ai-projects/pyproject.tom /opt/hostedtoolcache/Python/3.9.25/x64/bash get -projects/2.0.2 rgo/bin/bash bash ts/a�� --noprofile -c -for-python/.venv/bin/git get --local ndor/bin/lsb_rel--noprofile rustc (http block)

  • pypi.org
    • Triggering command: /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/pip pip install pathspec -q (dns block)
    • Triggering command: /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/lib/python3.9/site-packages/pip/__pip-REDACTED__.py install --ignore-installed --no-user --prefix /tmp/pip-build-env-76iq39iq/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i REDACTED -- setuptools>=77.0.3 wheel (dns block)
    • Triggering command: /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/lib/python3.9/site-packages/pip/__pip-REDACTED__.py install --ignore-installed --no-user --prefix /tmp/pip-build-env-5cme9dwm/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i REDACTED -- setuptools>=77.0.3 wheel (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Goal: Update sdk/ai/azure-ai-projects/azure/ai/projects/operations/_patch_evaluators.py so that _upload_folder_to_blob respects .gitignore patterns (and possibly .git/info/exclude) instead of using hard-coded skip_dirs and skip_extensions.

Context:

Current behavior:

  • Skips directories: __pycache__, .git, .venv, venv, node_modules
  • Skips file extensions: .pyc, .pyo

Desired behavior:

  • Use .gitignore rules from the uploaded folder root to decide which paths to skip during upload.
  • Ensure we do not traverse ignored directories and do not upload ignored files.
  • Still ensure .git itself is never uploaded, even if .gitignore is missing or doesn’t include it (safety).
  • Maintain current logging and error behavior.

Implementation requirements:

  • Add a lightweight dependency to parse gitignore patterns (prefer pathspec with gitwildmatch).
  • Implement helper(s) to load patterns and test a path relative to the folder root.
  • Integrate into os.walk loop by mutating dirs[:] and filtering files.
  • Handle missing .gitignore gracefully (upload everything except safety exclusions).
  • Add/adjust tests if this package has unit tests for upload behavior; otherwise add minimal tests that validate:
    • ignored directory is not traversed
    • ignored file is not uploaded
    • non-ignored file is uploaded
    • .pyc ignored by default only if .gitignore covers it (or decide to keep .pyc excluded as a safety default; document decision).

Update packaging:

  • Add dependency to the appropriate requirements/pyproject/setup config for the azure-ai-projects package.

Deliverables:

  • Code changes implementing gitignore-based filtering.
  • Any necessary dependency metadata changes.
  • Tests updated/added to cover new behavior.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: I have this code. How can I use .gitignore instead of this.

lines [61:62]
skip_dirs = {"pycache", ".git", ".venv", "venv", "node_modules"}
skip_extensions = {".pyc", ".pyo"}

Assistant: You can replace the hard-coded skip_dirs / skip_extensions with “honor the repo’s .gitignore” by using a gitignore parser and filtering paths before upload.

Recommended approach (practical + predictable)

  1. Load ignore rules from:
    • the .gitignore file in the folder being uploaded (and optionally global ignores like .git/info/exclude)
  2. While walking the tree, skip:
    • directories that match ignore rules (so you don’t descend into them)
    • files that match ignore rules (so you don’t upload them)

In Python, the most straightforward way is to use a library that implements gitignore semantics, e.g. pathspec (commonly used for this). Pseudocode (showing the shape of the change):

from pathlib import Path
import pathspec

def _load_gitignore_spec(root: Path) -> pathspec.PathSpec:
    gitignore = root / ".gitignore"
    patterns = []
    if gitignore.exists():
        patterns = gitignore.read_text(encoding="utf-8").splitlines()
    return pathspec.PathSpec.from_lines("gitwildmatch", patterns)

def _is_ignored(spec: pathspec.PathSpec, root: Path, path: Path) -> bool:
    rel = path.relative_to(root).as_posix()
    return spec.match_file(rel)

Then inside _upload_folder_to_blob:

  • build spec = _load_gitignore_spec(Path(folder))
  • in os.walk, filter dirs[:] by checking _is_ignored(spec, root_path, Path(root)/d)
  • for each file, skip if _is_ignored(...) is true

That gives you behavior consistent with .gitignore, and users can control what gets uploaded without changing SDK code.

Two important constraints to check first

  • This repo/package policy: Azure SDK for Python generally avoids adding new runtime dependencies unless there’s a strong justification (size/maintenance/security). If azure-ai-projects can’t take a new dependency, you’ll need an alternative.
  • What .gitignore should mean here: .gitignore ignores untracked files by default, but it can also match tracked files. For uploading, you probably want “exclude anything that .gitignore would ignore”, regardless of whether it’s tracked.

If you want: I can propose a dependency-free fallback

If adding pathspec isn’t acceptable, the next-best options are:

  • use git ls-files -co --exclude-standard to enumerate upload candidates (requires git CLI present), or
  • implement a limited subset of gitignore rules (usually not recommended; edge cases pile up).

If you tell me whether adding a small dependency is allowed for azure-ai-projects in this branch, I can sketch the exact implementation you’d drop into `_upl...

This pull request was created from Copilot chat.

- Add pathspec>=0.9.0 to pyproject.toml dependencies
- Add _load_gitignore_spec() and _is_ignored() module-level helpers in
  _patch_evaluators.py; reuse them in _patch_evaluators_async.py
- Replace hard-coded skip_dirs/skip_extensions with gitignore-based
  filtering; .git is always excluded as a safety guard
- Update and extend unit tests for both sync and async code paths

Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/7c2df3db-ebbe-4987-86ca-07b8339f05c5

Co-authored-by: w-javed <7674577+w-javed@users.noreply.github.com>
Copilot AI changed the title [WIP] Update _upload_folder_to_blob to respect .gitignore patterns azure-ai-projects: replace hard-coded skip lists with .gitignore-based filtering in _upload_folder_to_blob Apr 1, 2026
Copilot AI requested a review from w-javed April 1, 2026 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants