Skip to content

fix: sampler docs#125

Open
ilan-gold wants to merge 46 commits intomainfrom
ig/docs_sampler
Open

fix: sampler docs#125
ilan-gold wants to merge 46 commits intomainfrom
ig/docs_sampler

Conversation

@ilan-gold
Copy link
Copy Markdown
Collaborator

No description provided.

@ilan-gold ilan-gold added the skip-gpu-ci Whether gpu ci should be skipped label Jan 28, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.80%. Comparing base (c9a5971) to head (a4c94e9).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #125   +/-   ##
=======================================
  Coverage   90.80%   90.80%           
=======================================
  Files          10       10           
  Lines         805      805           
=======================================
  Hits          731      731           
  Misses         74       74           
Files with missing lines Coverage Δ
src/annbatch/abc/sampler.py 100.00% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ilan-gold ilan-gold requested a review from felix0097 January 28, 2026 09:25
Copy link
Copy Markdown
Collaborator

@felix0097 felix0097 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would definitely add an example here @selmanozleyen on how to actually implement this. One very simple example:

  1. How do I write out my own sampler (one example class, e.g. weighted sampler, or fully random sampler)
  2. How do I plug this into the rest of your code base e.g. how to actually use this

Moreover, did you check how this behaves on the full Tahoe dataset for e.g. fully random sampling. With the old code the memory usage blew up significantly. If this is still the case we should add a warning/caveats section

cc @ilan-gold

@felix0097
Copy link
Copy Markdown
Collaborator

@ilan-gold & @selmanozleyen I've added a detailed doc page on how to implement a custom sampler. Feel free to comment/edit

@felix0097
Copy link
Copy Markdown
Collaborator

ToDo: Update docs to reflect #127

Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md
Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md Outdated
felix0097 and others added 4 commits January 29, 2026 15:03
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md
│ 0-99 │ 100-199 │ 200-299 │ 300-399 │ 400-499 │ 500-599 │ 600-699 │ 700-799 │ 800-899 │ 900-999 │
└─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘

LoadRequest with chunks = [slice(200,300), slice(700,800), slice(0,100), slice(500,600)]:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe good to unalign the chunk boundaries to show that alignment with virtual chunk boundaries is not necessary?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment to clarify this 👍

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be unaligned still, which I am now more convinced of because it's a virtual concatenation of datasets, not zarr chunks (see below)

Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md Outdated
Comment thread README.md Outdated
felix0097 and others added 13 commits February 5, 2026 12:57
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
felix0097 and others added 16 commits February 5, 2026 16:33
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md
┌─────────────────────────────────────────────────────────────────────────┐
│ 0 1 2 3 ... 99 100 101 ... 199 200 ... 299 300 ... 399 │
│ │
│ [Chunk 200-299] [Chunk 700-799] [Chunk 0-99] [Chunk 500-599] │
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason I'd like the chunks to be a bit more complex - we order in-memory by on-disk dataset. So Chunk 0-99 definitely goes first. We could loosen things but that's how it works right now

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread docs/custom-sampler.md

Batch 1 (4 observations):
┌───────────────────────────────────────────────────────────────────┐
│ indices [0, 50, 150, 250] │
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still think these should be actually random indices

Comment thread docs/custom-sampler.md Outdated
Comment thread docs/custom-sampler.md
│ │
│ Disk (sequential reads per chunk) Memory (shuffled together) │
│ ┌───────────────┐ ┌──────────────────────┐ │
│ │ Chunk 0: 0-3 │ ═══════════╗ │ 8 2 11 0 5 9 │ │
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again, would do Dataset 0 virtual-concatenation indices: 0-3

ilan-gold and others added 8 commits February 6, 2026 13:48
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>
@felix0097 felix0097 self-assigned this Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-gpu-ci Whether gpu ci should be skipped

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants