fix: parallel generation, storage env vars, MPI topology, settle guard, CI thread cap by russfellows · Pull Request #12 · mlcommons/DLIO_local_changes

russfellows · 2026-04-13T16:30:05Z

Overview

This PR bundles four related improvement areas that together make dlio_benchmark more correct on multi-node clusters, significantly faster at data generation, and usable without a handcrafted YAML file when working with cloud object stores.

All changes are backwards-compatible: default values preserve existing behaviour.

Issues closed: #9, #10, #11, #12, #13 (and clarification #6b)
New tests: 28 tests added in tests/test_remaining_issues.py, all pass.

Change Details

1 — MPI Topology Fix for Thread Auto-Sizing (Issue 12)

Files: dlio_benchmark/utils/utility.py, dlio_benchmark/utils/config.py

Problem: read_threads auto-sizing divided os.cpu_count() by comm_size (total world size). On a 4-node × 16-rank job this produced 1 thread per rank instead of the correct 16-CPU ÷ 4-ranks-per-node = 4 threads per rank.

Fix: Added DLIOMPI.ranks_per_node() — a topology-aware method that returns the number of MPI ranks on the current node. It is safe to call in CHILD_INITIALIZED state (spawned workers) where mpi_ppn_list is not yet set; it falls back to comm_size as a conservative estimate. Both read_threads and write_threads auto-sizing now use ranks_per_node().

Before (4-node, 16-rank job, 64-core nodes):
  read_threads = cpu_count // comm_size = 64 // 16 = 4  ← wrong

After:
  read_threads = cpu_count // ranks_per_node = 64 // 4 = 16  ← correct

2 — Parallel Data Generation (Issues 10, 11, 6b)

Files: dlio_benchmark/utils/config.py, dlio_benchmark/data_generator/data_generator.py

A new write_threads: int field is added to ConfigArguments (default 1, auto-sized like read_threads). DataGenerator._generate_files() is rewritten with a two-phase design:

Phase 1 (main thread, sequential): all (index, dims, file_seed, output_path) tuples are pre-computed from the master RNG to ensure determinism regardless of thread count.
Phase 2 (parallel): each worker gets its pre-computed file_seed and a fresh np.random.default_rng(seed=file_seed) — no shared mutable state. For object stores, N threads = N concurrent uploads (Issue 11).

Determinism guarantee: write_threads=1 and write_threads=8 produce byte-for-byte identical files for the same config and master seed.

A clarifying comment was also added to DataGenerator.__init__() confirming that the early derive_configurations() call is intentional (Issue 6b).

3 — Storage Environment-Variable Overrides (Issue 9)

File: dlio_benchmark/utils/config.py — _apply_env_overrides()

Map well-known environment variables into ConfigArguments fields. Existing YAML / CLI values always win; env vars only fill fields still at their default.

Environment variable	Config field
`DLIO_STORAGE_TYPE`	`args.storage_type`
`DLIO_BUCKET`	`args.storage_root`
`DLIO_STORAGE_LIBRARY`	`args.storage_options['storage_library']`
`AWS_ACCESS_KEY_ID`	`args.storage_options['access_key_id']`
`AWS_SECRET_ACCESS_KEY`	`args.storage_options['secret_access_key']`
`AWS_ENDPOINT_URL`	`args.storage_options['endpoint_url']`
`AWS_REGION`	`args.storage_options['region']`

4 — Post-Generation Settle Guard for Object Stores (Issue 13)

Files: dlio_benchmark/utils/config.py, dlio_benchmark/main.py

New optional configuration parameter:

storage:
  post_generation_settle_seconds: 5.0   # default: 0.0 (disabled)

After generation completes, _apply_settle_guard(args, comm) is called:

No-op when storage_type == LOCAL_FS or settle == 0.0 — zero behaviour change for existing configs.
Active otherwise: rank 0 sleeps for the configured duration, then all ranks barrier together so they proceed once the store has settled.

5 — CI Thread Cap (DLIO_MAX_AUTO_THREADS)

DLIO_MAX_AUTO_THREADS environment variable caps the auto-sized thread count (default 8 in production, set to 2 in CI via .github/workflows/ci.yml and tests/conftest.py). This prevents resource contention when the full test suite runs on shared CI runners.

Tests

New file: tests/test_remaining_issues.py — 28 tests:

Test class	Tests	Covers
`TestIssue12_RanksPerNode`	4	`ranks_per_node()` correctness and safety
`TestIssue12_AutoSizingDenominator`	1	read_threads uses `ranks_per_node()`
`TestIssue10_WriteThreadsField`	3	`write_threads` field, default, auto-size
`TestIssue10_ParallelGeneration`	4	ThreadPoolExecutor used, files created, determinism, Issue 6b comment
`TestIssue9_StorageEnvOverrides`	11	All env vars applied; YAML wins; priority ordering
`TestIssue13_SettleGuard`	5	Field exists, default=0, sleep called for S3, not for LOCAL_FS, not for settle=0

Full suite result: 94 passed, 32 skipped.

Change code to use storage_root config option and namespace. Removes urlparsing for each I/O. Updates some default config options to be sane for both file and object.

- Add StorageLibrary enum to consistently select S3 libraries - Refactor storage_factory to route to selected library backends - Implement MinIO storage backend with MPI rank-based endpoint selection - Implement s3dlio storage backend with native multi-endpoint support - Enable comparison testing across S3 client libraries This enables DLIO benchmarks to test different S3 client implementations for performance comparison and multi-endpoint load balancing strategies.

…rgonne-lcf#325) - Move profiler imports inside get_profiler() method - Benefits: - Avoids loading TFProfiler (which imports tensorflow) unless needed - Reduces import overhead for users not using TENSORBOARD profiler - Default profiler (IOSTAT) no longer triggers tensorflow import - No breaking changes - same API, same behavior

Add a native AIStore storage handler that uses the official AIStore Python SDK for direct access, bypassing the S3 compatibility layer for better performance and simpler configuration. Changes: - Add AIStoreStorage class with full CRUD operations, range reads, and prefix-based object listing - Add StorageType.AISTORE enum and wire it through StorageFactory, GeneratorFactory, and ReaderFactory (reuses S3 generators/readers) - Add AIStore endpoint configuration support in ConfigArguments - Add 'aistore' optional dependency in setup.py - Add mock-based test suite with full AIStore SDK simulation - Add CI workflow for AIStore tests - Add storage configuration section to documentation Supported formats: NPY, NPZ, JPEG Supported frameworks: PyTorch, TensorFlow Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>

* fix(counters): train phase was not evaluated PR argonne-lcf#302 moved loop breaking condition from the end of the loop at its start. Which never fires self.stats.end_block of the current block as the iteration never start. Trying regulat pytorch loader from local fs: ``` [OUTPUT] 2026-02-27T06:58:50.214359 Running DLIO [Training & Evaluation] with 2 process(es) [WARNING] The amount of dataset is smaller than the host memory; data might be cached after the first epoch. Increase the size of dataset to eliminate the caching effect!!! [OUTPUT] 2026-02-27T06:58:50.229669 Max steps per epoch: 128 = 1 * 1024 / 4 / 2 (samples per file * num files / batch size / comm size) [OUTPUT] 2026-02-27T06:58:50.229764 Steps per eval: 32 = 1 * 64 / 1 / 2 (samples per file * num files / batch size eval / comm size) [OUTPUT] 2026-02-27T06:58:50.278417 Starting epoch 1: 128 steps expected [OUTPUT] 2026-02-27T06:58:50.278614 Starting block 1 [OUTPUT] 2026-02-27T06:59:03.743752 Ending epoch 1 - 128 steps completed in 13.47 s [OUTPUT] 2026-02-27T06:59:03.747196 Starting eval - 32 steps expected [OUTPUT] 2026-02-27T06:59:07.122980 Ending eval - 32 steps completed in 3.38 s [OUTPUT] 2026-02-27T06:59:07.124598 Epoch 1 [Eval] Accelerator Utilization [AU] (%): 99.4141 [OUTPUT] 2026-02-27T06:59:07.124644 Epoch 1 [Eval] Throughput (samples/second): 18.9592 [OUTPUT] 2026-02-27T06:59:07.130596 Starting epoch 2: 128 steps expected [OUTPUT] 2026-02-27T06:59:07.130832 Starting block 1 [OUTPUT] 2026-02-27T06:59:20.047588 Ending epoch 2 - 128 steps completed in 12.92 s [OUTPUT] 2026-02-27T06:59:20.048553 Starting eval - 32 steps expected [OUTPUT] 2026-02-27T06:59:23.276666 Ending eval - 32 steps completed in 3.23 s [OUTPUT] 2026-02-27T06:59:23.277556 Epoch 2 [Eval] Accelerator Utilization [AU] (%): 99.4022 [OUTPUT] 2026-02-27T06:59:23.277595 Epoch 2 [Eval] Throughput (samples/second): 19.8261 [OUTPUT] 2026-02-27T06:59:23.280422 Starting epoch 3: 128 steps expected [OUTPUT] 2026-02-27T06:59:23.280591 Starting block 1 [OUTPUT] 2026-02-27T06:59:36.196122 Ending epoch 3 - 128 steps completed in 12.92 s [OUTPUT] 2026-02-27T06:59:36.197005 Starting eval - 32 steps expected [OUTPUT] 2026-02-27T06:59:39.425806 Ending eval - 32 steps completed in 3.23 s [OUTPUT] 2026-02-27T06:59:39.426645 Epoch 3 [Eval] Accelerator Utilization [AU] (%): 99.4032 [OUTPUT] 2026-02-27T06:59:39.426682 Epoch 3 [Eval] Throughput (samples/second): 19.8219 [OUTPUT] 2026-02-27T06:59:39.469524 Saved outputs in /lus/flare/projects/DAOS_Testing/PAP166/hydra_log/default/2026-02-27-06-58-50 [OUTPUT] Averaged metric over all steps/epochs [METRIC] ========================================================== [METRIC] Number of Simulated Accelerators: 2 [METRIC] Training Accelerator Utilization [AU] (%): 0.0000 (0.0000) [METRIC] Training Throughput (samples/second): 0.0000 (0.0000) [METRIC] Training I/O Throughput (MB/second): 0.0000 (0.0000) [METRIC] train_au_meet_expectation: fail [METRIC] Eval Accelerator Utilization [AU] (%): 49.7048 (0.0028) [METRIC] Eval Throughput (samples/second): 9.765259 (0.206374) [METRIC] Eval Throughput (MB/second): 0.038146 (0.000806) [METRIC] eval_au_meet_expectation: fail [METRIC] ========================================================== [OUTPUT] 2026-02-27T06:59:39.484237 outputs saved in RANKID_output.json ``` Notice that logs are only show starting of the block and never its ending. After the fix: ``` [OUTPUT] 2026-02-28T12:30:28.000590 Running DLIO [Training & Evaluation] with 2 process(es) [WARNING] The amount of dataset is smaller than the host memory; data might be cached after the first epoch. Increase the size of dataset to eliminate the caching effect!!! [WARNING] Number of files for training in /dataset/train (4000) is more than requested (64). A subset of files will be used [WARNING] Number of files for training in /dataset/train (4000) is more than requested (64). A subset of files will be used [OUTPUT] 2026-02-28T12:30:28.102857 Max steps per epoch: 8 = 1 * 64 / 4 / 2 (samples per file * num files / batch size / comm size) [OUTPUT] 2026-02-28T12:30:28.102992 Steps per eval: 4 = 1 * 8 / 1 / 2 (samples per file * num files / batch size eval / comm size) [OUTPUT] 2026-02-28T12:30:30.572480 Starting epoch 1: 8 steps expected [OUTPUT] 2026-02-28T12:30:30.573084 Starting block 1 [OUTPUT] 2026-02-28T12:30:30.734535 Ending block 1 - 8 steps completed in 0.16 s [OUTPUT] 2026-02-28T12:30:30.740906 Epoch 1 - Block 1 [Training] Accelerator Utilization [AU] (%): 0.1428 [OUTPUT] 2026-02-28T12:30:30.740994 Epoch 1 - Block 1 [Training] Throughput (samples/second): 1753.1357 [OUTPUT] 2026-02-28T12:30:30.741060 Epoch 1 - Block 1 [Training] Computation time per step (second): 0.0000+/-0.0000 (set value: {}) [OUTPUT] 2026-02-28T12:30:30.741497 Ending epoch 1 - 8 steps completed in 0.17 s [OUTPUT] 2026-02-28T12:30:30.742789 Starting eval - 4 steps expected [OUTPUT] 2026-02-28T12:30:30.889307 Ending eval - 4 steps completed in 0.15 s [OUTPUT] 2026-02-28T12:30:30.891985 Epoch 1 [Eval] Accelerator Utilization [AU] (%): 0.0720 [OUTPUT] 2026-02-28T12:30:30.892054 Epoch 1 [Eval] Throughput (samples/second): 54.6620 [OUTPUT] 2026-02-28T12:30:30.900919 Starting epoch 2: 8 steps expected [OUTPUT] 2026-02-28T12:30:30.901249 Starting block 1 [OUTPUT] 2026-02-28T12:30:30.914273 Ending block 1 - 8 steps completed in 0.01 s [OUTPUT] 2026-02-28T12:30:30.915472 Epoch 2 - Block 1 [Training] Accelerator Utilization [AU] (%): 1.9055 [OUTPUT] 2026-02-28T12:30:30.915541 Epoch 2 - Block 1 [Training] Throughput (samples/second): 7765.7316 [OUTPUT] 2026-02-28T12:30:30.915595 Epoch 2 - Block 1 [Training] Computation time per step (second): 0.0000+/-0.0000 (set value: {}) [OUTPUT] 2026-02-28T12:30:30.915931 Ending epoch 2 - 8 steps completed in 0.02 s [OUTPUT] 2026-02-28T12:30:30.917061 Starting eval - 4 steps expected [OUTPUT] 2026-02-28T12:30:30.958733 Ending eval - 4 steps completed in 0.04 s [OUTPUT] 2026-02-28T12:30:30.959729 Epoch 2 [Eval] Accelerator Utilization [AU] (%): 0.0381 [OUTPUT] 2026-02-28T12:30:30.959768 Epoch 2 [Eval] Throughput (samples/second): 192.2493 [OUTPUT] 2026-02-28T12:30:30.960091 Starting epoch 3: 8 steps expected [OUTPUT] 2026-02-28T12:30:30.960275 Starting block 1 [OUTPUT] 2026-02-28T12:30:30.976061 Ending block 1 - 8 steps completed in 0.02 s [OUTPUT] 2026-02-28T12:30:30.977423 Epoch 3 - Block 1 [Training] Accelerator Utilization [AU] (%): 0.6369 [OUTPUT] 2026-02-28T12:30:30.977483 Epoch 3 - Block 1 [Training] Throughput (samples/second): 6020.3520 [OUTPUT] 2026-02-28T12:30:30.977534 Epoch 3 - Block 1 [Training] Computation time per step (second): 0.0000+/-0.0000 (set value: {}) [OUTPUT] 2026-02-28T12:30:30.977792 Ending epoch 3 - 8 steps completed in 0.02 s [OUTPUT] 2026-02-28T12:30:30.978884 Starting eval - 4 steps expected [OUTPUT] 2026-02-28T12:30:30.983803 Ending eval - 4 steps completed in 0.00 s [OUTPUT] 2026-02-28T12:30:30.984927 Epoch 3 [Eval] Accelerator Utilization [AU] (%): 1.3682 [OUTPUT] 2026-02-28T12:30:30.984986 Epoch 3 [Eval] Throughput (samples/second): 1641.1245 [OUTPUT] 2026-02-28T12:30:30.986010 Saved outputs in /home/denis/dev/enakta/dlio_benchmark/hydra_log/default/2026-02-28-12-30-25 [OUTPUT] Averaged metric over all steps/epochs [METRIC] ========================================================== [METRIC] Number of Simulated Accelerators: 2 [METRIC] Training Accelerator Utilization [AU] (%): 0.5939 (0.4129) [METRIC] Training Throughput (samples/second): 4948.3957 (2466.6534) [METRIC] Training I/O Throughput (MB/second): 19.3297 (9.6354) [METRIC] train_au_meet_expectation: fail [METRIC] Eval Accelerator Utilization [AU] (%): 0.4704 (0.5038) [METRIC] Eval Throughput (samples/second): 444.414075 (396.070635) [METRIC] Eval Throughput (MB/second): 1.735992 (1.547151) [METRIC] eval_au_meet_expectation: fail [METRIC] ========================================================== [OUTPUT] 2026-02-28T12:30:30.987839 outputs saved in RANKID_output.json ``` Signed-off-by: Denis Barakhtanov <dbarahtanov@enakta.com> * fix: remove unreachable branch Signed-off-by: Denis Barakhtanov <dbarahtanov@enakta.com> --------- Signed-off-by: Denis Barakhtanov <dbarahtanov@enakta.com> Co-authored-by: Denis Barakhtanov <denis.barahtanov@gmail.com>

…nd (argonne-lcf#329) Every new storage backend required copy-pasting each generator into an _XXX sibling file: npz_generator_s3.py, npy_generator_s3.py and so on. The only difference was whether to write the output locally on disk, directly via numpy/PIL, or via the storage interface. This makes the pattern unsustainable: two duplicated formats today, more with each new backend — incurring a significant maintenance burden. Since all generators already had a storage instance and used it to generate file names, we can leverage it. The only set of generators now can check if the stroage is locally available via `islocalfs` and use some optimisation, if any. If the storage is not local, the sample serializes to io.BytesIO, call buf.getvalue(), and delegate to self.storage.put_data(). All storage backends receive plain bytes as designed by the storage interface, removing type inspection, seek() and getvalue() calls scattered across backends. - FileStorage.put_data was never called, had text-mode open and a double get_uri call (once from the generator, once inside put_data itself). Now it is the default write path for LOCAL_FS, used by almost every workload config. get_data aligned to binary mode ("rb") for consistency. - AIStoreStorage.put_data: remove isinstance dispatch, accept bytes directly. - S3TorchStorage.put_data: remove data.getvalue() — just write data. - generator_factory: removed S3/AIStore branching for NPZ, NPY, JPEG. - factory referenced jpeg_generator_s3.JPEGGeneratorS3 which never existed; JPEG + S3/AIStore would crash at import time. After this patch, adding a new storage backend requires no changes in any generator. Adding a new data format automatically works with all backends. Signed-off-by: Denis Barakhtanov <dbarahtanov@enakta.com> Co-authored-by: Denis Barakhtanov <denis.barahtanov@gmail.com>

…dpsi backends, checkpointing) - Rework s3_torch_storage.py with multi-library S3 support - Enhance all 4 checkpointing modules (pytorch, pytorch_s3, tf, base) - Remove minio_storage.py and s3dlio_storage.py (consolidated) - Add s3_storage_dpsi.py and s3_torch_storage_dpsi.py (new dpsi backends) - Update storage_factory.py, config.py, utility.py, enumerations.py - Update unet3d S3 workload configs - Update jpeg/png data generators and main.py WIP snapshot: 2026-03-18

…age-integration_2026-0318

…-var config - Rename s3_torch_storage.py → obj_store_lib.py (multi-library backend) - Delete s3_torch_storage_dpsi.py (dpsi architecture removed) - storage_factory.py: route S3+PyTorch to ObjStoreLibStorage only (no dpsi branch) - config.py: add _load_dotenv() and _apply_env_overrides() per Hari's recommendation Single location for all os.getenv calls; precedence: YAML > env > .env > defaults Introduces DLIO_OUTPUT_FOLDER env var to redirect output directory - conftest.py: set DLIO_OUTPUT_FOLDER=dlio_test_output for all test runs - dlio_benchmark_test.py: inject output.folder via OmegaConf; clean() uses named dir - dlio_s3_benchmark_test.py: same output dir fix + run_benchmark() OmegaConf injection - dlio_aistore_benchmark_test.py: same fix + add missing OmegaConf import

…tion_2026-0318 feat: multi-library S3 object storage integration (minio, s3dlio, s3torchconnector)

… (v3.0.0-beta) Version bump: 2.0.0 → 3.0.0-beta - setup.py: version 3.0.0, Development Status 4-Beta, add 'parquet' extra requiring pyarrow>=12.0.0 New readers (dlio_benchmark/reader/): - npz_reader_s3_iterable.py: NPZReaderS3Iterable — parallel prefetch of all NPZ files assigned to a DLIO worker thread via s3dlio.get_many() (up to 64 concurrent range GETs) or minio ThreadPoolExecutor; eliminates the serial one-round-trip-per-file penalty of the existing NPZReaderS3 - npy_reader_s3_iterable.py: NPYReaderS3Iterable — mirrors NPZ version for raw numpy files (no key extraction) - parquet_reader_s3_iterable.py: ParquetReaderS3Iterable — row-group-granular parquet reader using HTTP byte-range GETs; opens files by reading only the footer, then fetches individual row groups on demand via s3dlio.get_range() or minio.get_object(offset=, length=); LRU-bounded row-group cache; supports optional column projection via storage_options.columns Adapter classes: _S3RangeFile (s3dlio/s3torchconnector) and _MinioRangeFile provide the seekable file-like interface required by pyarrow.parquet Storage and config fixes: - obj_store_lib.py: remove env-var fallbacks (STORAGE_LIBRARY, DLIO_URI_SCHEME, DLIO_OBJECT_KEY_USE_FULL_URI, DLIO_ENDPOINT_URL) — config.py is now the single source of truth; values must flow through storage_options - obj_store_lib.py: fix list_objects() to use s3dlio.list(uri, recursive=True) with correct prefix stripping (removes double-slash and bucket-prefix issues) - config.py: promote storage.storage_library from top-level storage section into storage_options dict so backends can access it consistently Enumerations: - enumerations.py: add FormatType.PARQUET = 'parquet' and get_enum() branch reader_factory.py: - Route FormatType.NPZ + FormatType.NPY to iterable readers when storage_library is s3dlio, s3torchconnector, or minio - Route FormatType.PARQUET to ParquetReaderS3Iterable All three reader variants support s3dlio, s3torchconnector, and minio as interchangeable storage backends via storage_options.storage_library.

…erable-s3_2026-0319 feat: add parallel S3 iterable readers and parquet byte-range support

Add StorageType.DIRECT_FS = 'direct_fs' to enumerations so that the O_DIRECT backend (via s3dlio direct:// URI) can be selected at runtime. Update storage_factory.py to treat DIRECT_FS identically to LOCAL_FS for DLIO's internal file-listing path; actual I/O is handled by the StreamingCheckpointing layer which routes direct_fs to s3dlio.

…ackends Introduce a new checkpoint type PT_OBJ_SAVE that routes checkpointing through pytorch_obj_store_checkpointing.py, enabling minio and s3dlio as checkpoint storage backends without requiring s3torchconnector. Key changes: - pytorch_obj_store_checkpointing.py: New checkpoint engine using ObjStoreLibStorage for save/load via minio or s3dlio libraries - pytorch_checkpointing.py: Use _streaming_cache dict keyed by backend type; select direct_fs (O_DIRECT) vs file (fadvise) based on storage_type arg - pytorch_s3_checkpointing.py: S3 backend refinements - checkpointing_factory.py: Route PT_OBJ_SAVE to new class - config.py: Fix storage_library-aware validation; convert OmegaConf DictConfig to plain dict before adding dynamic storage_library key

Feat/obj store checkpointing

… image support SUMMARY ======= This commit consolidates a major enhancement to DLIO's S3 object storage support. All changes support three storage libraries (s3dlio, s3torchconnector, minio) with strict per-library isolation — no silent fallback, no cross-library usage. Failing to install the configured library raises ImportError at construction time with a clear pip install hint. CHANGED FILES ============= dlio_benchmark/reader/npy_reader_s3_iterable.py - REPLACED: Rewrote from scratch. Previously, s3torchconnector branch silently called s3dlio.get_many() — wrong library, wrong behavior, wrong docstring. - NEW: Dedicated _prefetch_s3torchconnector() method using S3IterableDataset .from_objects() with S3ReaderConstructor.sequential() — no s3dlio dependency. - NEW: Early ImportError in __init__ if s3torchconnector not installed. - NEW: Strict per-library dispatch in _prefetch(): s3dlio / s3torchconnector / minio each handled explicitly; raises ValueError for unknown library. - NEW: Full module docstring listing all 3 libraries and strict-isolation warning. - FIXED: s3torchconnector env var not set for s3torchconnector (only s3dlio). dlio_benchmark/reader/npz_reader_s3_iterable.py - FIXED: stale docstrings removed ('listing uses s3dlio' was false for s3torchconnector and minio paths). - IMPROVED: _prefetch_s3dlio uses _BytesViewIO + io.BufferedReader to trigger np.load's readinto() path (in-place copy into numpy buffer) rather than bytes() (separate Python allocation). Peak memory: Rust BytesView only. - IMPROVED: _get_minio_client() cached across epochs for TCP keep-alive; urllib3 PoolManager with retry, timeout, maxsize=16 configuration. - IMPROVED: _prefetch_s3torchconnector() uses S3IterableDataset.from_objects() with sequential() reader (matches npy pattern). - IMPROVED: Module docstring accurately describes all 3 libraries. dlio_benchmark/reader/parquet_reader_s3_iterable.py - FIXED CRITICAL: s3torchconnector previously used s3dlio.get_range() and s3dlio.stat() internally — completely wrong, the docstring lied. - NEW: s3torchconnector uses S3Client.get_object() with S3ReaderConstructor.range_based() returning RangedS3Reader (BufferedIOBase with full seek/tell/read/readinto + SEEK_END). Requires s3torchconnector>=1.3.0. - NEW: Early ImportError + RuntimeError (version check for range_based attr) in __init__ for s3torchconnector — fail at construction, not during I/O. - NEW: self._s3torch_client = S3Client cached at construction time. - NEW: _make_range_file() dispatches to native RangedS3Reader for s3torchconnector; _S3RangeFile for s3dlio; _MinioRangeFile for minio. - FIXED: urllib.parse.urlparse import moved to top-level imports (was duplicated inside branches). - FIXED: Module docstring corrected — removed false 'uses s3dlio as engine'. dlio_benchmark/reader/reader_factory.py - NEW: JPEG/PNG storage_type routing — was completely missing, silently sending S3 workloads to ImageReader (local FS reader that calls PIL.Image.open(path)), which would fail hard with a misleading file-not-found error. - NEW: Routes JPEG and PNG on S3/AIStore with recognized storage_library to ImageReaderS3Iterable; falls back to ImageReader for local FS. - UNCHANGED: NPY/NPZ S3 routing (existing, correct). - UNCHANGED: Parquet always routes to ParquetReaderS3Iterable (by design). dlio_benchmark/reader/image_reader_s3_iterable.py [NEW FILE] - NEW: Parallel-prefetch JPEG/PNG reader for S3-compatible object stores. - Inherits from ImageReader (which inherits from FormatReader) — reuses get_sample, next, read_index from parent chain. - Supports all 3 storage libraries with identical pattern to NPYReaderS3Iterable: _prefetch_s3dlio, _prefetch_s3torchconnector, _prefetch_minio. - __init__: early fail for s3torchconnector (ImportError with pip hint). - open(): returns pre-fetched decoded numpy array from cache. - Uses PIL.Image.open() + np.asarray() for decode (to be removed in follow-up refactoring; only image.nbytes is used for telemetry, not the decoded data). dlio_benchmark/storage/obj_store_lib.py - IMPROVED: ObjStoreLibStorage enhanced for s3torchconnector and minio. - NEW: MinIOAdapter to make Minio client compatible with S3Client-like API. - IMPROVED: list_objects(), get_data(), put_data() all dispatch per library. dlio_benchmark/storage/storage_factory.py - ADDED: ObjStoreLibStorage path for S3 + PyTorch framework combination. - ADDED: AIStore support via AIStoreStorage (guarded import, fails with clear error if aistore package not installed). - DEBUG: Temporary debug prints left in for storage routing visibility. dlio_benchmark/utils/config.py - IMPROVED: storage_options propagated through ConfigArguments. - IMPROVED: storage_library field parsing from YAML. dlio_benchmark/utils/utility.py - IMPROVED: gen_random_tensor() uses dgen-py when available for 30-50x speedup. dlio_benchmark/data_generator/npz_generator.py - FIXED: minor generator compatibility improvement. dlio_benchmark/reader/npy_reader_s3.py dlio_benchmark/reader/npz_reader_s3.py - FIXED: minor compatibility fixes (vestigial sequential readers). README.md - MAJOR: Added comprehensive S3/object storage support documentation: overview, per-library install instructions, configuration YAML examples, run commands for all 3 libraries, timing correctness note. docs/DLIO-Object-Storage_Analysis.md [NEW FILE] - NEW: Analysis of DLIO timing loop behavior with object storage. Documents that measurement semantics are unchanged; S3 I/O occurs inside DataLoader worker prefetch, which is correctly inside the timed region. KNOWN ISSUES / FOLLOW-UP (tracked for next commit) =================================================== - Code replication: NPZ/NPY/Image S3 iterable readers share ~150 lines of identical prefetch logic (uri_for_obj_key, _prefetch_s3dlio/s3torch/minio, _prefetch dispatch, next, read_index). Refactoring to _S3IterableMixin planned as immediate follow-up. - numpy/PIL decode overhead: all three readers decode raw bytes to numpy arrays (np.load, PIL.Image.open + np.asarray) solely to get image.nbytes for telemetry. The actual decoded data is NEVER used — FormatReader.next() always yields self._args.resized_image (pre-allocated random tensor). Replacing decode with len(raw_bytes) eliminates unnecessary CPU work. - Parquet factory: no storage_type guard; configuring parquet + local FS silently constructs ParquetReaderS3Iterable which fails confusingly. - storage_factory.py: debug print() statements to be removed. - Old sequential readers (npy_reader_s3, npz_reader_s3): vestigial, factory no longer routes to them for recognized storage_library values.

…erableMixin MOTIVATION ---------- Three reader files (NPZ/NPY/Image) each contained ~250-307 lines of duplicated prefetch logic: per-library dispatch (_prefetch_s3dlio / _prefetch_s3torchconnector / _prefetch_minio), Minio client construction, endpoint env setup, and s3torchconnector import validation. A design review also revealed that all numpy/PIL decode (np.load, PIL.Image.open, np.asarray) inside those prefetch methods was pure CPU overhead whose result was NEVER used — FormatReader.next() always yields self._args.resized_image, a pre-allocated random tensor from config.py, not the actual decoded file data. CHANGES ------- _s3_iterable_mixin.py (new, 328 lines) Shared mixin for all three S3 iterable readers. Contains: - _s3_init(opts): library validation at construction, sets _storage_library / _opts / _object_cache / _minio_client; validates s3torchconnector is importable immediately (not lazily), so misconfiguration fails fast. - _uri_for_obj_key(): s3://bucket/key URI construction. - _get_minio_client(): lazy, cached urllib3.PoolManager + Minio SDK client; reused across epochs to avoid rebuilding TCP connection pools per epoch. - _prefetch_s3dlio(obj_keys) -> {key: len(data)}: parallel get via s3dlio.get_many(); stores ONLY the raw byte count, no numpy decode. - _prefetch_s3torchconnector(obj_keys) -> {key: len}: sequential streaming GET per object via S3IterableDataset.from_objects(); drains the reader with read() for byte count, no numpy decode. - _prefetch_minio(obj_keys) -> {key: len}: ThreadPoolExecutor + Minio.get_object; stores ONLY the raw byte count, no numpy decode. - _prefetch(obj_keys): dispatches to the above three (strict, no fallback). - _s3_prefetch_all(): collects deduplicated obj_keys for the current thread's file_map slice, calls _prefetch(), populates _object_cache. - _s3_ensure_cached(filename): on-demand fetch if filename not in cache. npz_reader_s3_iterable.py (307 lines → 74 lines) Thin subclass of NPZReader + _S3IterableMixin. Overrides: - open(filename): returns self._object_cache.get(filename) (int or None). - close(filename): evicts from cache. - get_sample(filename, sample_index): calls dlp.update(image_size=...) with cached byte count. Does NOT call super() — NPZReader.get_sample() would do open_file_map[filename][..., idx] which fails on an int. - next(): calls _s3_prefetch_all() then delegates to super().next(). - read_index(): calls _s3_ensure_cached() then delegates to super(). npy_reader_s3_iterable.py (254 lines → 107 lines) Thin subclass of NPYReader + _S3IterableMixin. Same override pattern as NPZ. get_sample() does NOT call super() for the same reason (NPYReader.get_sample() also indexes open_file_map[filename][..., sample_index]). image_reader_s3_iterable.py (259 lines → 110 lines) Thin subclass of ImageReader + _S3IterableMixin. Same override pattern. get_sample() additionally calls dft_ai.update(image_size=byte_count) to replicate the second metric update that ImageReader.get_sample() would have performed. Does NOT call super().get_sample() (ImageReader calls .nbytes on the cached value which is an int, not an ndarray). PERFORMANCE IMPACT ------------------ Before: every prefetched object was decoded (np.load / PIL.Image.open / np.asarray), consuming significant CPU time and memory for data that was immediately discarded. After: only len(raw_bytes) is stored. No numpy or PIL imports in any thin subclass. Minio client pooling across epochs reduces TCP setup overhead for all three formats. LINE COUNT SUMMARY ------------------ Before: ~820 lines across 3 files (+ no mixin) After: ~291 lines across 3 files + 328-line mixin = 619 total Net: -201 lines of code, -0 lines of unique logic (all logic is in mixin)

…d thin subclasses - _s3_init(): minio now validates its import EAGERLY at construction time, matching s3dlio (env setup) and s3torchconnector (import check). All three libraries now fail-fast at __init__ instead of deferring minio's failure to the first I/O call. - image_reader_s3_iterable.py: fix stale module docstring that still said 'decodes them with Pillow into a numpy uint8 array' — PIL decode was eliminated in the mixin refactor. - image_reader_s3_iterable.py: remove unused 'import io' and 'import os'. - npy_reader_s3_iterable.py: remove unused 'import os'.

…uting ParquetReader (parquet_reader.py, new) Filesystem counterpart to ParquetReaderS3Iterable. Uses pyarrow natively (no object-storage adapters) with identical logic: - open(): reads parquet footer, builds cumulative row-group offset list - get_sample(): bisect maps sample_index → row_group, LRU cache bounds memory, reports compressed_bytes to dlp profiler - close(): evicts row-group cache entries for the file - Same options as S3 variant: columns, row_group_cache_size under storage_options - finalize(): clears entire row-group cache reader_factory.py (fix) FormatType.PARQUET was unconditionally routing ALL storage types to ParquetReaderS3Iterable — local filesystem parquet workloads would crash because _S3RangeFile tries to call s3dlio.stat() on a local path. Fixed to match the NPY/NPZ/JPEG pattern: S3 / AIStore → ParquetReaderS3Iterable (existing) local / lustre / etc. → ParquetReader (new)

…and unify bucket property - config.py: Remove stale AIStore+PyTorch format restriction (NPZ/NPY only); AIStore reader_factory already routes JPEG/PNG/Parquet to the S3-iterable readers, so the check was blocking valid workloads. - config.py: Remove legacy NPYReaderS3/NPZReaderS3 import validation block; those are the old non-mixin readers, not the path AIStore actually takes. - config.py: Remove [DEBUG LoadConfig] print block (ENTRY + EXIT summaries). - storage_factory.py: Remove all [DEBUG StorageFactory] print statements. - aistore_storage.py: Replace per-method 'if not self.bucket:' guards with a lazy @Property that initialises self._bucket on first access; all methods now simply reference self.bucket and get the cached handle automatically. - aistore_storage.py: Fix isfile() which was missing the bucket guard entirely, causing AttributeError if called before any other storage operation.

The file defined its own 'class S3Storage(DataStorage)' — identical name to s3_storage.py — creating a latent class-name collision if ever imported. It was added in commit 14561b8 as a work-in-progress prototype and was immediately superseded by ObjStoreLibStorage in obj_store_lib.py. Zero callers exist anywhere in the codebase (confirmed via grep).

All [DEBUG ...] prints in __init__, put_data, and get_data are now commented out rather than deleted, so they can be re-enabled easily during local debugging. The one active print (error in list_objects) is a real error message and is left in place.

…re_lib Replaces the 20 commented-out # print(f'[DEBUG ...]') lines with proper logging.debug() calls, keyed off the existing DLIO_LOG_LEVEL infrastructure. Benefits over # print: - Zero-cost when DLIO_LOG_LEVEL != debug (short-circuit before formatting) - Appears automatically in the log file with timestamps - Includes file path + line number in debug mode - Works for users without source access (no code changes needed to enable) The credentials block uses an isEnabledFor(DEBUG) guard so the src_key/ src_sec/src_ep intermediate vars are only computed when debug is active. Enable with: DLIO_LOG_LEVEL=debug dlio_run ...

…-and-cleanup Feat/multi lib storage readers and cleanup

…rchconnector) Add full multi-library checkpoint support with the following changes: pytorch_obj_store_checkpointing.py: - Unified checkpoint writer/reader for s3dlio, minio, and s3torchconnector via storage_library key in the workload YAML - s3dlio multipart tuning: env-var overrides S3DLIO_MULTIPART_PART_SIZE_MB / S3DLIO_MULTIPART_MAX_IN_FLIGHT; production defaults restored to 128 MB x 8 - Documents v0.9.82 regression (blocking semaphore) and GitHub issue argonne-lcf#134 in a large comment block above the s3dlio kwargs section pytorch_s3_checkpointing.py: - Deleted: functionality superseded by pytorch_obj_store_checkpointing.py config.py / enumerations.py: - Recognise storage_library: s3dlio | minio | s3torchconnector from workload YAML - Inject value into storage_options so PyTorchObjStoreCheckpointing can read it - Set correct checkpoint_mechanism and reader_classname per library - Fail fast with clear error if the selected library package is not installed obj_store_lib.py: - Instantiate the correct S3 client based on storage_library selection - s3dlio: PyObjectStoreClient; minio: Minio SDK; s3torch: S3Client _s3_iterable_mixin.py / parquet_reader_s3_iterable.py: - S3 reader cleanups for multi-library correctness tests/dlio_s3_benchmark_test.py: - Update tests to cover multi-library checkpoint paths docs/AIStore_Analysis.md: - New analysis document

feat: multi-library object-store checkpointing fixes

Merge strategy: prefer russfellows/main (HEAD) throughout. Conflicts resolved: - README.md: kept our S3/object-storage install section and extended storage backend description - enumerations.py: kept our version (whitespace-only difference) - npz_generator.py: kept our put_data(path, output) — passes BytesIO object directly to our storage library abstraction - reader_factory.py: kept our storage_library-aware dispatch for NPY/NPZ/Parquet S3 iterable readers - parquet_reader.py: kept our LRU-cached row-group reader (add/add — more sophisticated implementation) - aistore_storage.py: kept ours (newer implementation, add/add) - s3_torch_storage.py: kept deleted (replaced by obj_store_lib.py) - utils/config.py: kept our storage_library-aware S3 validation; relaxed AIStore format restrictions - setup.py: kept our 'parquet' extra with pyarrow>=12.0.0 - tests/dlio_aistore_benchmark_test.py: kept ours (add/add) Auto-merged from mlcommons/main: - data_generator/generator_factory.py - data_generator/parquet_generator.py (new file)

…rs, and framework layer All 8 DLIO benchmark formats (npy, npz, hdf5, parquet, csv, jpeg, png, tfrecord) now work correctly end-to-end against object storage (S3/MinIO/GCS/Azure) via the s3dlio storage library. This required fixes spanning data generators, readers, the TensorFlow framework layer, storage factory, config handling, and a 10-bug root-cause analysis. ## Data generators (data_generator/) ### Base class (data_generator.py) - Added _generate_files(write_fn) template method — eliminates ~15-line loop boilerplate duplicated across all 10 generators - Added _file_seed(i) helper: per-file deterministic seed = BASE_SEED + file_index - Added _extract_dims(i) helper for consistent dimension extraction - Migrated all 10 generators to use _generate_files() template ### Bug: np.random.seed(10) — all MPI ranks produce identical data All generators called np.random.seed(10) unconditionally before their write loop. With MPI, every rank wrote the same data to different files, making distributed generation produce duplicate datasets. Fixed with rank-unique per-file seeding via _file_seed(i). ### Bug: NPZ generator passed BytesIO object instead of bytes npz_generator.py called storage.put_data(path, output) where output was a BytesIO object. Fixed to output.getvalue() to pass actual bytes. ### Added object storage support to 6 generators that had none: - hdf5_generator.py: uses h5py core driver with BytesIO backing - csv_generator.py: uses io.StringIO → encode → put_data - tf_generator.py: uses BytesIO + TFRecord framing - indexed_binary_generator.py: uses BytesIO; also replaced legacy np.random global API with gen_random_tensor() (dgen-py, ~155x faster) - synthetic_generator.py: uses BytesIO - parquet_generator.py: uses pyarrow.BufferOutputStream; also replaced legacy np.random global API with gen_random_tensor() ## Read path (reader/) ### Bug: reader_factory.py routed CSV/HDF5/TFRECORD to wrong readers for s3dlio When storage_library=s3dlio, CSV/HDF5/TFRECORD formats were routed to local-file readers that called open() on S3 URIs. Fixed by adding s3dlio dispatch branches that select the new iterable readers. ### Three new S3 iterable readers: - reader/csv_reader_s3_iterable.py: parallel-prefetch CSV reader using s3dlio.get_object() with ThreadPoolExecutor prefetch - reader/hdf5_reader_s3_iterable.py: parallel-prefetch HDF5 reader using h5py core driver over BytesIO from s3dlio - reader/tfrecord_reader_s3_iterable.py: parallel-prefetch TFRecord reader; no protobuf decode (raw tensor extraction); fixed KeyError: -1 when thread_index=-1 by explicitly collecting all file_map values in that case ## TensorFlow framework layer (framework/tf_framework.py) ### Bug: all 7 storage methods used tf.io.gfile for S3 URIs tf.io.gfile does not support s3dlio-managed endpoints or auth. All 7 methods (read, write, delete, stat, listdir, makedirs, exists) were rewritten to dispatch to s3dlio.* for s3://, gs://, and az:// URIs, falling back to tf.io.gfile for local paths. ## Storage factory (storage/storage_factory.py) ### Bug: TENSORFLOW framework type got wrong storage class FrameworkType.TENSORFLOW was not in the ObjStoreLibStorage branch, so TensorFlow workloads got S3Storage which double-mangled S3 URIs. Added TENSORFLOW alongside PYTORCH in the ObjStoreLibStorage dispatch. ## Config handling (utils/config.py) ### Bug: build_sample_map_iter() called os.path.abspath() on S3 URIs os.path.abspath("s3://bucket/path") returns a local path like /current/dir/s3:/bucket/path, breaking all sample map construction for object storage workloads. Fixed with a StorageType.LOCAL_FS guard so abspath() is only called for local filesystem paths. ## AIStore storage (storage/aistore_storage.py) ### Bug: import-time logging.warning() fired unconditionally The except ImportError block emitted a logging.warning() even when aistore was not installed and the user had no intention of using AIStoreStorage. Moved the error to __init__() so it only fires when actually instantiated. ## Tests ### tests/test_data_generator_improvements.py (new, 24 tests) Unit and integration tests covering: - _file_seed() determinism and rank-uniqueness - _generate_files() template invocation count - NPZ BytesIO vs getvalue() correctness - Per-format generator smoke tests (mock storage) - MPI rank seeding uniqueness ### tests/test_s3dlio_object_store.py (new, 8 tests) End-to-end integration tests against real MinIO (opt-in via env var): - Full put + verify + get cycle for all 8 formats - All 8/8 formats confirmed passing: npy, npz, hdf5, parquet, csv, jpeg, png, tfrecord ## Documentation - docs/data_generator_analysis.md: implementation summary covering all bugs fixed, new readers added, test results, and file change inventory

feat: full object storage support all formats — generators, readetd

… warnings - Disable dftracer entirely: no import, always-active no-op stubs in utility.py; remove dftracer globals/calls from main.py; configure_dftracer/finalize_dftracer become no-ops in config.py; set_dftracer_initialize/finalize kept as no-ops - Change default multiprocessing_context from 'fork' to 'spawn' (avoids deadlocks in multi-threaded test processes); remove fork guard from configure_dlio_logging - Fix pin_memory: AND with torch.cuda.is_available() so no UserWarning on CPU hosts - Fix NumPy empty-slice warnings: guard io_save/duration_save stats with len() > 0 check, consistent with existing io_load guard in statscounter.py - Object-storage tests strictly opt-in via DLIO_OBJECT_STORAGE_TESTS=1 env var - Add DALI skip guards; make dftracer optional in pyproject.toml/setup.py - Fix DLIOMPI singleton reset in all test finalize() methods - Fix generate_random_shape to use seeded RNG (deterministic) - Remove dead duplicate OmegaConf call in test_npy_reader_compatibility - Remove dlp_logger/dftracer finalizer from TorchDataset worker

PR3 — test infra hardening, disable dftracer, spawn MP, fix…

- bench_generation.py: JPEG/PNG fast-path vs PIL encode speedup - bench_readers.py: reader parity baseline benchmark - bench_readers2.py: decode cost isolation + parallel prefetch analysis - bench_config_fixes.py: 16-case behavioral verification of config.py fixes (iterative sampler bug, multiprocessing_context auto-derive, read_threads auto-size) - dlio_fix_verification_report.md: full results report for PR submissions All scripts run via: uv run python tests/PRs-12-Apr-26/<script>.py

…r2026 test: add PR verification benchmarks and report (April 12, 2026)

…d (Issues 9,10,11,12,13,6b) PR-13 (Issue 12): Add ranks_per_node() to DLIOMPI; use it for read/write thread auto-sizing instead of total comm_size so multi-node configs divide CPUs by ranks-per-node, not total world size. PR-14 (Issues 10+11+6b): Parallel data generation via ThreadPoolExecutor. - Add write_threads field to ConfigArguments (default 1, auto-sized like read_threads) - Rewrite _generate_files() with two-phase design: pre-derive RNG seeds sequentially (preserves determinism), then dispatch writes in parallel worker threads - Each worker gets its own np.random.default_rng(seed) — no shared state - Identical output with any write_threads value (determinism guaranteed) - Add clarifying comment to DataGenerator.__init__ re: Issue 6b (non-bug confirm) PR-12 (Issue 9): Storage environment-variable overrides in _apply_env_overrides(). - DLIO_STORAGE_LIBRARY → storage_options['storage_library'] - DLIO_BUCKET → storage_root (if unset) - DLIO_STORAGE_TYPE → storage_type (if unset) - AWS_ACCESS_KEY_ID → storage_options['access_key_id'] - AWS_SECRET_ACCESS_KEY→ storage_options['secret_access_key'] - AWS_ENDPOINT_URL → storage_options['endpoint_url'] - AWS_REGION → storage_options['region'] - YAML/CLI values always win (env vars only fill unset fields) - Optional dotenv dict support with env vars taking priority PR-15 (Issue 13): Post-generation settle guard for eventual-consistency stores. - Add post_generation_settle_seconds: float = 0.0 to ConfigArguments - Load from config['storage']['post_generation_settle_seconds'] in LoadConfig() - Add _apply_settle_guard(args, comm) to main.py: rank-0 sleeps then barrier - Guard is no-op for LOCAL_FS or when settle=0 (zero behaviour change by default) - Call after generation barrier in DLIOBenchmark.initialize() Tests: tests/test_remaining_issues.py — 28 tests, all pass

fix: parallel generation, storage env vars, MPI topology, etc.

The parallel generation feature (PR #12) auto-sizes write_threads and read_threads to cpu_count // ranks_per_node, which on a 64-core dev box produced 8 threads per test. Running the full suite with that many threads caused resource contention between concurrent mpirun subprocesses, leading to intermittent failures (empty stdout/stderr, return code 1). Changes: - config.py: replace hardcoded cap of 8 with int(os.environ.get('DLIO_MAX_AUTO_THREADS', '8')) so the ceiling is overridable without code changes - tests/conftest.py: set DLIO_MAX_AUTO_THREADS=2 via os.environ.setdefault so all tests (local and CI) run with a known bounded thread count - .github/workflows/ci.yml: add DLIO_MAX_AUTO_THREADS: '2' to the job env block so GitHub Actions runners (2 vCPUs) are not over-committed No behaviour change for production runs (default cap remains 8).

…ability fix: cap auto-sized thread count in tests and CI (DLIO_MAX_AUTO_THREADS)

…heck

russfellows · 2026-04-13T17:42:26Z

Working on fixing the problematic "pydftracer" code, and also limiting the tests to use only Python 3.12. Fingers crossed.

…breaks via-setup Preflight)

….py install_requires when [project] table exists)

…tension dftracer.dftracer required by Preflight)

…last=True semantics PyTorch DataLoader uses drop_last=True which discards the final incomplete batch. The TF path goes through FormatReader.next() which was padding the last batch to batch_size — causing extra fetch_iter events and failing assertions: expected_iters = num_epochs * (num_data_pp // batch_size) # floor division Removing the padding means an incomplete last batch is silently dropped (the yield only fires when len(batch) == batch_size). This matches PyTorch behavior and fixes test_ai_logging_train[tensorflow-*] failures.

…batch count mismatch Auto-threading (PR-5/PR-13) can set read_threads>1 in CI. With multiple TF interleave threads, samples are partitioned per-thread, so each thread yields floor(samples_per_thread/batch_size) batches — the sum is less than floor(total_samples/batch_size), breaking the assertion: expected_iters = num_epochs * (num_data_pp // batch_size) e.g. 10 samples, bs=2, read_threads=2: expected = 10//2 = 5 but actual = 2*(floor(5/2)) = 4 The test was written assuming single-threaded iteration. Pin read_threads=1 to match that assumption. test_ai_logging_train_with_step is unaffected as it already parametrizes read_threads explicitly.

…LE env var The d9f175b commit hardcoded DFTRACER_ENABLE=False and replaced all dftracer code with permanent no-op stubs. This broke dlio_ai_logging_test.py entirely, since those tests require actual .pfw trace files to be written. - utility.py: restore conditional dftracer.python import; fall back to no-op stubs only when the library is absent or raises ImportError - config.py: restore configure_dftracer() to call PerfTrace.initialize_log() when DFTRACER_ENABLE is True; restore finalize_dftracer() to flush - main.py: restore dftracer_initialize/finalize/dftracer globals, the call to configure_dftracer() in __init__(), and the finalize_dftracer() call in finalize(); restore set_dftracer_initialize/finalize to update globals CI sets DFTRACER_ENABLE=1, which now causes the real dftracer C extension to initialize and write .pfw trace files, allowing test_ai_logging to pass.

russfellows · 2026-04-13T19:02:38Z

It turns out that problematic DFTracer isn't used in the REAL code, but used extensively during the CI tests. So needed to restore it. I tried to disable requiring it, which worked when running real tests, but not CI tests. Fun stuff.

…alize() Without 'global dftracer' in __init__, the assignment: dftracer = self.args.configure_dftracer(...) created a local variable, leaving the module-level dftracer=None. finalize() then saw None and never called finalize_dftracer(). TensorFlow calls os._exit() during shutdown, bypassing Python atexit handlers — so without explicit finalize_dftracer(), the .pfw trace file is never flushed. PyTorch exits cleanly so dftracer's atexit fires as a fallback, which is why pytorch tests passed but tensorflow tests did not.

russfellows · 2026-04-13T19:49:08Z

Still lots of issues with the massive number of tests all testing (now) incorrect assumptions, like "did we manipulate an image correctly?" The answer now is "no, and we don't care." So lots of bad, outdated assumptions baked into these hundreds of tests cases.

…s as manual - Rename ci.yml → integration.yml; change trigger to workflow_dispatch only so the full 21-suite run is manual-only (GitHub Actions UI). - Add .github/workflows/fast-ci.yml: 3-leg matrix (via-uv, via-setup, via-reqs) that runs on every push and PR, completes in < 10 minutes. - Add tests/test_fast_ci.py: 65-test fast CI suite covering preflight imports, enumerations, utilities, config defaults/derive, generator and reader factories, NPY/NPZ/HDF5/image I/O, MPI smoke (np=2), and an end-to-end generate+train pipeline.

russfellows · 2026-04-15T00:59:14Z

Update the CI tests. Now they are LIGHT years faster.

pyarrow is a core dependency in pyproject.toml but was missing from requirements-test.txt, causing test_pyarrow to fail in the via-reqs matrix leg.

russfellows · 2026-04-15T01:09:24Z

Fixed missing pyarrow in requirements.txt

dpsi and others added 30 commits January 14, 2026 11:07

Fix s3pytorch force path style boolean option.

585f375

Refactor S3 pytorch implementation.

7078286

Change code to use storage_root config option and namespace. Removes urlparsing for each I/O. Updates some default config options to be sane for both file and object.

Merge remote-tracking branch 'upstream/main' into feature/object-stor…

f0c3743

…age-integration_2026-0318

Merge pull request #1 from russfellows/feature/object-storage-integra…

a207bcb

…tion_2026-0318 feat: multi-library S3 object storage integration (minio, s3dlio, s3torchconnector)

Merge pull request #2 from russfellows/feature/parquet-readers-and-it…

4d5703c

…erable-s3_2026-0319 feat: add parallel S3 iterable readers and parquet byte-range support

Merge pull request #3 from russfellows/feat/obj-store-checkpointing

3f37071

Feat/obj store checkpointing

Merge pull request #4 from russfellows/feat/multi-lib-storage-readers…

8ad890f

…-and-cleanup Feat/multi lib storage readers and cleanup

Merge pull request #5 from russfellows/bugs/checkpoint-fixes

ca08e29

feat: multi-library object-store checkpointing fixes

Merge pull request #6 from russfellows/feat/fix-object-gen-and-readers

7a79779

feat: full object storage support all formats — generators, readetd

russfellows and others added 10 commits April 12, 2026 21:32

Merge pull request #10 from russfellows/fix/test-infra-hardening-pr

d8414bf

PR3 — test infra hardening, disable dftracer, spawn MP, fix…

Merge pull request #11 from russfellows/feat/pr-verification-tests-ap…

67c9c6d

…r2026 test: add PR verification benchmarks and report (April 12, 2026)

Merge pull request #12 from russfellows/fix/issues-9-10-11-12-13-6b

7d3348f

fix: parallel generation, storage env vars, MPI topology, etc.

Merge pull request #13 from russfellows/fix/ci-thread-cap-and-test-st…

c031c3f

…ability fix: cap auto-sized thread count in tests and CI (DLIO_MAX_AUTO_THREADS)

Merge branch 'main' into upstream-pr/all-fixes-apr2026

4767619

chore: remove uv.lock, internal docs, and bench scripts from upstream PR

c21a37c

russfellows requested a review from a team April 13, 2026 16:30

russfellows added 2 commits April 13, 2026 11:37

chore: lock Python to 3.12 only (>=3.12,<3.13)

bdd7901

fix: add pydftracer to setup.py test deps for Preflight C extension c…

af94edf

…heck

russfellows added 7 commits April 13, 2026 11:49

fix: restore pydftracer to core_deps (was removed in earlier commit, …

73d87c1

…breaks via-setup Preflight)

fix: add pydftracer to pyproject.toml dependencies (pip ignores setup…

8c66d92

….py install_requires when [project] table exists)

fix: add dftracer>=2.0.1 to pyproject.toml test extras (provides C ex…

9f9c800

…tension dftracer.dftracer required by Preflight)

fix: add pyproject.toml to CI cache key to prevent stale venv reuse

9103266

russfellows added 2 commits April 13, 2026 13:13

ci: remove test_ai_logging step — dftracer integration tests not needed

6ceb009

russfellows mentioned this pull request Apr 14, 2026

Add arrow reader and generator #14

Open

ci: add pyarrow to requirements-test.txt for via-reqs leg

7da0b04

pyarrow is a core dependency in pyproject.toml but was missing from requirements-test.txt, causing test_pyarrow to fail in the via-reqs matrix leg.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: parallel generation, storage env vars, MPI topology, settle guard, CI thread cap#12

fix: parallel generation, storage env vars, MPI topology, settle guard, CI thread cap#12
russfellows wants to merge 85 commits intomlcommons:mainfrom
russfellows:upstream-pr/all-fixes-apr2026

russfellows commented Apr 13, 2026

Uh oh!

russfellows commented Apr 13, 2026

Uh oh!

russfellows commented Apr 13, 2026

Uh oh!

russfellows commented Apr 13, 2026

Uh oh!

russfellows commented Apr 15, 2026

Uh oh!

russfellows commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

russfellows commented Apr 13, 2026

Overview

Change Details

1 — MPI Topology Fix for Thread Auto-Sizing (Issue 12)

2 — Parallel Data Generation (Issues 10, 11, 6b)

3 — Storage Environment-Variable Overrides (Issue 9)

4 — Post-Generation Settle Guard for Object Stores (Issue 13)

5 — CI Thread Cap (DLIO_MAX_AUTO_THREADS)

Tests

Uh oh!

russfellows commented Apr 13, 2026

Uh oh!

russfellows commented Apr 13, 2026

Uh oh!

russfellows commented Apr 13, 2026

Uh oh!

russfellows commented Apr 15, 2026

Uh oh!

russfellows commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants