WIP: ENH: OOC architecture rewrite — new bulk I/O API and infrastructure#1568
Draft
joeykleingers wants to merge 8 commits intoBlueQuartzSoftware:developfrom
Draft
WIP: ENH: OOC architecture rewrite — new bulk I/O API and infrastructure#1568joeykleingers wants to merge 8 commits intoBlueQuartzSoftware:developfrom
joeykleingers wants to merge 8 commits intoBlueQuartzSoftware:developfrom
Conversation
b4ef97f to
99b49ed
Compare
b4ef97f to
bb09048
Compare
This was referenced Mar 24, 2026
102c436 to
b4c1358
Compare
2bd614a to
110c054
Compare
Replace the chunk-based DataStore API with a plugin-driven hook
architecture that cleanly separates OOC policy (in the SimplnxOoc
plugin) from mechanism (in the core library). The old API required
every caller to understand chunk geometry; the new design hides OOC
details behind bulk I/O primitives and plugin-registered callbacks.
--- AbstractDataStore / IDataStore API ---
Remove the entire chunk API from AbstractDataStore and IDataStore:
loadChunk, getNumberOfChunks, getChunkLowerBounds, getChunkUpperBounds,
getChunkShape, getChunkSize, getChunkTupleShape, getChunkExtents, and
convertChunkToDataStore. Replace with two bulk I/O primitives:
copyIntoBuffer(startIndex, span<T>) and copyFromBuffer(startIndex,
span<const T>), implemented in DataStore (std::copy on raw memory) and
EmptyDataStore (throws). This shifts the abstraction from "load a
chunk, then index into it" to "copy a contiguous range into a caller-
owned buffer," which works identically for in-core and OOC stores.
Simplify StoreType to three values (InMemory, OutOfCore, Empty) by
removing EmptyOutOfCore. IsOutOfCore() now checks StoreType instead
of testing getChunkShape().has_value(). Add getRecoveryMetadata()
virtual to IDataStore for crash-recovery attribute persistence.
--- Plugin Hook System (DataIOCollection / IDataIOManager) ---
Add three plugin-registered callback hooks to DataIOCollection:
FormatResolverFnc: Decides storage format for a given array based on
type, shape, and size. Called from DataStoreUtilities::CreateDataStore
and CreateListStore. Replaces the removed checkStoreDataFormat() and
TryForceLargeDataFormatFromPrefs — format decisions now live entirely
in the plugin, with core only calling resolveFormat() when no format
is already set.
BackfillHandlerFnc: Post-import callback that lets the plugin finalize
placeholder stores after all HDF5 objects are read. Called from
ImportH5ObjectPathsAction after importing all paths. Replaces the
removed backfillReadOnlyOocStores core implementation.
WriteArrayOverrideFnc: Intercepts HDF5 writes during recovery file
creation, allowing the plugin to write lightweight placeholder
datasets instead of full array data. Activated via RAII
WriteArrayOverrideGuard, wired into DataStructureWriter.
Add factory registration on IDataIOManager for ListStoreRefCreateFnc,
StringStoreCreateFnc, and FinalizeStoresFnc, with delegating creation
methods on DataIOCollection. Guard against reserved format name
"Simplnx-Default-In-Memory" during IO manager registration.
--- EmptyStringStore Placeholder ---
Add EmptyStringStore, a placeholder class for OOC string array import
that stores only tuple shape metadata. All data access
methods throw std::runtime_error. isPlaceholder() returns true (vs
false for StringStore). StringArrayIO creates EmptyStringStore in OOC mode instead of
allocating numValues empty strings.
--- HDF5 I/O ---
DataStoreIO::ReadDataStore gains two interception paths before the
normal in-core load: (1) recovery file detection via OocBackingFilePath
HDF5 attributes, creating a read-only reference store pointing at the
backing file; (2) OOC format resolution via resolveFormat(), creating a
read-only reference store directly from the source .dream3d file with
no temp copy.
DataArrayIO::writeData always calls WriteDataStore
directly — OOC stores materialize their data through the plugin's
writeHdf5() method; recovery writes use WriteArrayOverrideFnc.
NeighborListIO gains OOC interception: computes total neighbor count,
calls resolveFormat(), and creates a read-only ref list store when an
OOC format is available. Legacy NeighborList reading passes a preflight
flag through the entire call chain (readLegacyNeighborList ->
createLegacyNeighborList -> ReadHdf5Data) so legacy .dream3d imports
create EmptyListStore placeholders instead of eagerly loading per-
element via setList().
DataStructureWriter checks WriteArrayOverrideFnc before normal writes,
giving the registered plugin callback first chance to handle each
data object.
Add explicit template instantiations for DatasetIO::createEmptyDataset
and DatasetIO::writeSpanHyperslab for all numeric types plus bool.
These are needed by the SimplnxOoc plugin's AbstractOocStore::writeHdf5(),
which cannot use writeSpan() because the full array is not in memory.
Instead it creates an empty dataset, then fills it region-by-region
via hyperslab writes as it streams data from the backing file.
--- Preferences ---
Add unified oocMemoryBudgetBytes preference (default 8 GB) that
the ChunkCache, visualization, and stride cache all use. Add k_InMemoryFormat
sentinel constant for explicit in-core format choice. Add migration
logic to erase legacy empty-string and "In-Memory" preference values.
checkUseOoc() now tests against k_InMemoryFormat.
setLargeDataFormat("") removes the key so plugin defaults take effect.
--- Algorithm Infrastructure ---
AlgorithmDispatch: Add ForceInCoreAlgorithm/ForceOocAlgorithm global
flags with RAII guards. Add DispatchAlgorithm template that selects
Direct (in-core) vs Scanline (OOC) algorithm variant based on store
types and force flags. Add SIMPLNX_TEST_ALGORITHM_PATH CMake option
(0=both, 1=OOC-only, 2=InCore-only) for dual-dispatch test control.
IParallelAlgorithm: Remove blanket TBB disabling for OOC data — OOC
stores are now thread-safe via ChunkCache + HDF5 global mutex.
CheckStoresInMemory/CheckArraysInMemory use StoreType instead of
getDataFormat().
VtkUtilities: Rewrite binary write path to read into 4096-element
buffers via copyIntoBuffer, byte-swap in the buffer, and fwrite —
replacing direct DataStore data() pointer access.
--- Filter Algorithm Updates ---
FillBadData: Rewrite phaseOneCCL and phaseThreeRelabeling to use
Z-slab buffered I/O via copyIntoBuffer/copyFromBuffer instead of
the removed chunk API (loadChunk, getChunkLowerBounds, etc.).
operator()() scans feature counts in 64K-element chunks via
copyIntoBuffer.
QuickSurfaceMesh: Remove getChunkShape() call in generateTripleLines()
that set ParallelData3DAlgorithm chunk size, as the chunk API no
longer exists on AbstractDataStore.
--- File Import ---
ImportH5ObjectPathsAction: Add deferred-load pattern. When a backfill
handler is registered, pass preflight=true to create placeholder stores
during import, then call runBackfillHandler() after all paths are
imported to let the plugin finalize.
Dream3dIO: Add WriteRecoveryFile() that wraps WriteFile with WriteArrayOverrideGuard.
--- Utility Changes ---
DataStoreUtilities: Remove TryForceLargeDataFormatFromPrefs entirely.
CreateDataStore and CreateListStore call resolveFormat() on the IO
collection. ArrayCreationUtilities: check k_InMemoryFormat sentinel
before skipping memory checks.
ITKArrayHelper/ITKTestBase: OOC checks use getStoreType() instead of
getDataFormat().empty(). IsArrayInMemory simplified from a 40-line
DataType switch to a single StoreType check.
ArraySelectionParameter: Remove EmptyOutOfCore handling; simplify to
just StoreType::Empty.
--- Tests ---
Add EmptyStringStore tests (6 cases: metadata, zero tuples, throwing
access, deep copy placeholder preservation, resize, isPlaceholder).
Add DataIOCollection hooks tests (format resolver, backfill handler).
Add IOFormat tests (7 cases: InMemory sentinel, empty format,
resolveFormat with/without plugin). Add IParallelAlgorithm OOC tests
(8 cases with MockOocDataStore: TBB enablement for in-memory, OOC,
and mixed arrays/stores).
Remove the "Target DataStructure Size" test from IOFormat.cpp — it
was a tautology that re-implemented the same arithmetic as
updateMemoryDefaults() without testing any edge case or behavior.
Fix RodriguesConvertorTest exemplar data: add missing expected values
for the 4th tuple (indices 12-15). The old CompareDataArrays broke
on the first floating-point mismatch regardless of magnitude, masking
this incomplete exemplar. The new chunked comparison correctly
continues past epsilon-close differences, exposing the missing data.
Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
Add comprehensive documentation to all new methods, type aliases, classes, and algorithms introduced in the OOC architecture rewrite. Every new public API now has Doxygen explaining what it does, how it works, and why it is needed. Algorithm implementations have step-by- step inline comments explaining the logic. Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
…ation layer Move the format resolver call site from the low-level DataStoreUtilities:: CreateDataStore/CreateListStore functions up to the array creation layer (ArrayCreationUtilities::CreateArray and ImportH5ObjectPathsAction). This is a prerequisite for the upcoming data store import handler refactor. Key architectural changes: 1. FormatResolverFnc signature expanded to (DataStructure, DataPath, DataType, dataSizeBytes). The resolver can now walk parent objects to determine geometry type, enabling it to force in-core for unstructured/ poly geometry arrays without caller-side checks. 2. Format resolution removed from DataStoreUtilities::CreateDataStore and CreateListStore. These are now simple factories that take an already- resolved format string. Callers are responsible for calling the resolver. 3. CreateArrayAction no longer carries a dataFormat member or constructor parameter. The k_DefaultDataFormat constant is removed. Format is resolved at execute time inside ArrayCreationUtilities::CreateArray. 4. ImportH5ObjectPathsAction gains a format-resolver loop that iterates Empty-store DataArrays after preflight import, consulting the resolver to decide which arrays to eager-load (in-core) vs leave for the backfill handler (OOC). 5. DataStoreIO::ReadDataStore and NeighborListIO::finishImportingData lose their inline format-resolution and OOC reference-store creation code. Format decisions for imported data are now made at the action level, not during raw HDF5 I/O. 6. Geometry actions (CreateGeometry1D/2D/3DAction, CreateVertexGeometry, CreateRectGridGeometry) lose their createdDataFormat parameter. They now materialize OOC topology arrays into in-core stores when the source arrays have StoreType::OutOfCore, since unstructured/poly geometry topology must be in-core for the visualization layer. 7. CheckMemoryRequirement simplified to a pure RAM check. OOC fallback logic removed since the resolver handles format decisions upstream. All filter callers updated to drop the dataFormat argument from CreateArrayAction constructors. Python binding updated (data_format parameter renamed to fill_value). Test files updated for new resolveFormat signature.
…arden .dream3d import Rename the "backfill handler" to "data store import handler" and expand its role to handle ALL data store loading from .dream3d files — in-core eager loading, OOC reference stores, and recovery reattachment. This replaces the split decision-making where ImportH5ObjectPathsAction ran a format-resolver loop and a separate backfill handler. Key changes: 1. DataIOCollection: Rename BackfillHandlerFnc to DataStoreImportHandlerFnc with expanded signature that includes importStructure. Rename set/has/runBackfillHandler to set/has/runDataStoreImportHandler. Add format display name registry (registerFormatDisplayName/getFormatDisplayNames) for human-readable format names in the UI dropdown. 2. DataStoreIO: Rename ReadDataStore to ReadDataStoreIntoMemory. Remove recovery reattachment code (OOC-specific HDF5 attribute checks moved to SimplnxOoc plugin). Add placeholder detection — compares physical HDF5 element count against shape attributes, returns Result<> with warning when mismatch detected (guards against loading placeholder datasets without the OOC plugin). Change return type to Result<shared_ptr<AbstractDataStore<T>>> so callers can accumulate warnings across arrays. 3. ImportH5ObjectPathsAction: Remove the format-resolver loop (79 lines). The action now delegates entirely to the registered handler when present, or falls back to FinishImportingObject for non-OOC builds. 4. CreateArrayAction: Restore dataFormat parameter for per-filter format override. When non-empty, bypasses the format resolver. Dropdown shows "Automatic" (resolver decides), "In Memory", and plugin-registered formats with display names. Fix 12 filter callers where fillValue was being passed as dataFormat after parameter reordering. 5. Dream3dIO: Route DREAM3D::ReadFile through ImportH5ObjectPathsAction so recovery and OOC hooks fire. Remove unused ImportDataObjectFromFile and ImportSelectDataObjectsFromFile. 6. Application: Add getDataStoreFormatDisplayNames() to expose display name registry to DataStoreFormatParameter. Updated callers: DataArrayIO (2 sites), NeighborListIO (2 sites), Dream3dIO (2 legacy helpers), DataStructureWriter (comment), 12 filter files, simplnxpy Python binding, DataIOCollectionHooksTest.
Replace the old Dream3dIO public API (ReadFile, ImportDataStructureFromFile, FinishImportingObject) with four new purpose-specific functions: - LoadDataStructure(path) — full load with OOC handler support - LoadDataStructureArrays(path, dataPaths) — selective array load with pruning - LoadDataStructureMetadata(path) — metadata-only skeleton (preflight) - LoadDataStructureArraysMetadata(path, dataPaths) — pruned metadata skeleton The new API eliminates the bool preflight parameter in favor of distinct functions, decouples pipeline loading from DataStructure loading, and centralizes the OOC handler integration in a single internal LoadDataStructureWithHandler function. Key changes: DataIOCollection: Add EagerLoadFnc typedef and pass it through the DataStoreImportHandlerFnc signature, replacing the importStructure parameter. The handler can now eager-load individual arrays via callback without knowing Dream3dIO internals. ImportH5ObjectPathsAction: Rewrite to use the new API — preflight calls LoadDataStructureMetadata, execute calls LoadDataStructure. The action no longer manages HDF5 file handles or deferred loading directly; it merges source objects into the pipeline DataStructure via shallow copy. ReadDREAM3DFilter: Switch preflight from ImportDataStructureFromFile(reader, true) to LoadDataStructureMetadata(path), removing manual HDF5 file open. Dream3dIO internals: Move LoadDataObjectFromHDF5, EagerLoadDataFromHDF5, PruneDataStructure, and LoadDataStructureWithHandler into an anonymous namespace. LoadDataStructureWithHandler implements the shared logic: build metadata skeleton, optionally delegate to the OOC import handler, fall back to eager in-core loading. Test callers: Switch ComputeIPFColorsTest, RotateSampleRefFrameTest, DREAM3DFileTest, and H5Test to UnitTest::LoadDataStructure. Add Dream3dLoadingApiTest with coverage for all four new functions. UnitTestCommon: Simplify LoadDataStructure/LoadDataStructureMetadata helpers to delegate directly to the new DREAM3D:: functions.
Add the namespace fs = std::filesystem alias to .cpp files that spell out std::filesystem, consistent with the existing convention used throughout the codebase (e.g., AtomicFile.cpp, FileUtilities.cpp, all ITK test files, UnitTestCommon.hpp). Files updated: Dream3dIO.cpp, ImportH5ObjectPathsAction.cpp, DataIOCollection.cpp, H5Test.cpp, UnitTestCommon.cpp, DREAM3DFileTest.cpp, ComputeIPFColorsTest.cpp.
Previously IDataStore provided a default implementation that returned an empty map, which silently disabled recovery metadata for any store subclass that forgot to override it. Make it pure virtual so every concrete store must explicitly state what (if any) recovery metadata it produces. DataStore overrides it to return an empty map (in-memory stores have no backing file or external state, so the recovery file's HDF5 dataset contains all the data needed to reconstruct the store). EmptyDataStore overrides it to throw std::runtime_error, matching the fail-fast behavior of every other data-access method on this metadata- only placeholder class. Querying recovery metadata on a placeholder is a programming error: the real store that replaces the placeholder during execution is the one responsible for providing recovery info. MockOocDataStore in IParallelAlgorithmTest.cpp gains a no-op override returning an empty map so it remains constructible. Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
73d697c to
4a7cd61
Compare
…y format sentinel Addresses code review feedback on DataIOCollection ownership and factory error messages. Ownership clarification: * DataStoreUtilities::GetIOCollection() and Application::getIOCollection() now return DataIOCollection& instead of std::shared_ptr. The collection is owned by the Application singleton which outlives every caller, so a reference expresses non-ownership more clearly than a shared_ptr and prevents accidental lifetime extension. * WriteArrayOverrideGuard stores a DataIOCollection& member. Since the guard is already non-copyable and non-movable, a reference member is natural and the "may be null no-op" path was dropped (no caller used it). In-memory format sentinel hygiene: * CoreDataIOManager::formatName() now returns Preferences::k_InMemoryFormat instead of the empty string. Empty means "unset/auto" and k_InMemoryFormat means "explicit in-memory"; previously "" was doing double duty. * DataIOCollection constructor registers the core manager directly into the manager map, bypassing the addIOManager() guard. The guard still rejects plugin registrations under the reserved name. * createDataStore/createListStore fallbacks now look up the core manager from m_ManagerMap under k_InMemoryFormat instead of constructing a fresh local CoreDataIOManager. * ArrayCreationUtilities no longer translates k_InMemoryFormat to ""; the RAM-check path recognizes both sentinels as in-core. Actionable factory errors: * Added DataIOCollection::generateManagerListString() that produces a padded multi-line capability matrix of every registered IO manager and the store types it supports (DataStore, ListStore, StringStore, ReadOnlyRef(DataStore), ReadOnlyRef(ListStore)). Uses display names where registered, falling back to the raw format identifier. * Wired the helper into the existing CreateArray nullptr-check error message so users can immediately see which formats are available when a requested format is unknown. Tests updated to reflect the new reference API. Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rewrites the out-of-core (OOC) architecture in simplnx, replacing the old chunk-based API with a new bulk I/O design built around
copyIntoBuffer/copyFromBufferonAbstractDataStore. Introduces the core infrastructure that the OOC-optimized filter algorithms (separate PR #1575) build upon.Core Architecture Changes
AbstractDataStore/IDataStore(loadChunk,getNumberOfChunks,getChunkLowerBounds,getChunkUpperBounds,getChunkShape)copyIntoBuffer/copyFromBufferpure virtual bulk I/O methods toAbstractDataStorewith implementations inDataStore,EmptyDataStore, andHDF5ChunkedStore(in SimplnxOoc plugin)StoreTypeenum (InMemory,OutOfCore,Empty) toIDataStore;IsOutOfCore()now checksStoreTypeinstead ofgetChunkShape()HDF5ChunkedStoreperforms I/O via HDF5 hyperslab selections with Z-slice-aligned default chunk shape{1,Y,X}for 3D datacopyFromBufferfast path: skips read-modify-write for tuple-aligned writescopyIntoBufferfast path: direct span-basedreadTuplesfor tuple-aligned readsDatasetIOgainsreadTuples/writeTuplesfor direct hyperslab-based bulk tuple I/ONew Core Utilities
DispatchAlgorithm— Runtime dispatch between in-core (Direct) and OOC (Scanline/CCL) algorithm variants based on data store typeSliceBufferedTransfer— Type-dispatched Z-slice buffered tuple copy utility that eliminates per-element OOC overhead during morphological transfer phasesUnionFind— Vector-based disjoint set data structure with union-by-rank and path-halving compression for chunk-sequential CCL algorithmsSegmentFeaturesOOC path — Z-slice CCL-based connected component labeling withUnionFindequivalence tracking, replacing BFS/DFS flood fill for OOC dataAlignSectionsOOC path — Bulk slice read/write withAlignSectionsTransferDataOocImplDataArrayUtilitiesbulk I/O —ImportFromBinaryFile,AppendData,CopyData, and mirrorswap_rangesupdated with chunked bulk I/O (runtime OOC check preserves original in-core performance)OOC Store Management
DataIOCollection/IDataIOManager— Updated for OOC store lifecycle managementImportH5ObjectPathsAction— OOC-aware file import with recovery metadataDataStoreIO— Detect OOC recovery attributes inReadDataStorefor safe data restorationTest Infrastructure
CompareDataArraysrewritten to usecopyIntoBufferin 40K-element chunks instead of per-elementoperator[]ForceOocAlgorithmGuardfor dual-path test coverageSIMPLNX_TEST_ALGORITHM_PATHCMake option (0=Both, 1=OOC-only, 2=InCore-only) for build-specific test path controlRelated PRs
Test Plan