You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With arraybridge, users declare their function's memory type via decorators:
34
34
35
35
```python
36
-
result = convert_memory(data, source_type='cupy', target_type='torch', gpu_id=0)
36
+
@cupy
37
+
defsobel_2d(image):
38
+
"""Edge detection on GPU with automatic OOM recovery."""
39
+
return cp.gradient(image)
37
40
```
38
41
39
-
The library handles DLPack zero-copy transfers when available, falls back to NumPy bridging otherwise, manages per-thread CUDA streams for safe parallelization, detects and recovers from GPU out-of-memory errors across all frameworks, and preserves dtypes through conversions preventing silent precision loss.
42
+
The decorator handles DLPack zero-copy transfers when available, falls back to NumPy bridging otherwise, manages per-thread CUDA streams for safe parallelization, detects and recovers from GPU out-of-memory errors, and preserves dtypes through conversions.
40
43
41
44
# Statement of Need
42
45
@@ -51,6 +54,15 @@ The problem is not just syntax—each framework has different:
51
54
52
55
Writing correct multi-framework code requires understanding all these differences. `arraybridge` consolidates this knowledge into a single declarative configuration, generating 36 conversion methods (6 frameworks × 6 target types) from ~450 lines of configuration rather than hand-written code for each path.
**DLPack**[@dlpack] provides a zero-copy tensor sharing protocol adopted by all major frameworks. However, DLPack handles only the data transfer—users must still detect framework types, handle fallbacks when DLPack fails, manage device placement, and deal with framework-specific exceptions.
@@ -59,16 +71,9 @@ Writing correct multi-framework code requires understanding all these difference
59
71
60
72
**Framework-specific utilities** (`torch.from_numpy()`, `cupy.asarray()`) handle only their own framework pairs. They provide no OOM recovery, no stream management, and no dtype preservation guarantees.
61
73
62
-
`arraybridge` differs in four key areas:
63
-
64
-
1.**Unified conversion API**: Single function for all 36 source/target combinations
# Similar entries for TENSORFLOW, JAX, PYCLESPERANTO, NUMPY
90
+
# Similar entries for TORCH, TENSORFLOW, JAX, PYCLESPERANTO, NUMPY
96
91
}
97
92
```
98
93
99
-
At import time, converter classes are generated dynamically via `AutoRegisterMeta` from the `metaclass-registry` library. Each converter implements `to_numpy()`, `from_numpy()`, `from_dlpack()`, and `to_X()` methods for all target frameworks. The metaclass auto-registers each converter by its `memory_type` attribute, eliminating manual registration.
94
+
At import time, converter classes are generated dynamically via `AutoRegisterMeta` from the `metaclass-registry` library[@metaclassregistry]. Each converter implements `to_numpy()`, `from_numpy()`, `from_dlpack()`, and `to_X()` methods for all target frameworks. Adding a seventh framework requires only adding its entry to `_FRAMEWORK_CONFIG`—no new code paths.
100
95
101
-
**Thread-local GPU streams** are managed via `threading.local()`. The `@cupy` and `@torch` decorators automatically create per-thread CUDA streams:
96
+
**Thread-local GPU streams** are managed via `threading.local()`. Without this, multiple threads sharing a GPU would serialize on the default stream or corrupt each other's operations. With per-thread streams, true parallel GPU execution is possible:
102
97
103
98
```python
104
-
@torch(oom_recovery=True)
99
+
@torch
105
100
defsegment_image(image):
106
101
return model(image) # Runs on thread-local stream
107
102
```
108
103
109
-
This enables true parallelization—multiple threads can execute GPU operations simultaneously without stream conflicts.
110
-
111
-
**OOM recovery** unifies detection across frameworks. The library checks both exception types and error string patterns (e.g., "out of memory", "resource_exhausted"), clears framework-specific caches, and retries:
104
+
**OOM recovery** (enabled by default) unifies detection across frameworks. The library checks both exception types and error string patterns (e.g., "out of memory", "resource_exhausted"), clears framework-specific caches, and retries:
`arraybridge` is a core component of OpenHCS, an open-source platform for high-content screening microscopy. In OpenHCS pipelines:
129
122
130
-
-**GPU-accelerated stitching**(`ashlar_compute_tile_positions_gpu`) uses CuPy for phase correlation
131
-
-**Flatfield correction**(`basic_flatfield_correction_cupy`) uses CuPy with OOM recovery and automatic fallback to CPU
132
-
-**Edge detection**(`sobel_2d_vectorized`) uses CuPy with dtype preservation to maintain uint16 microscopy data
123
+
-**GPU-accelerated stitching** uses CuPy for phase correlation
124
+
-**Flatfield correction** uses CuPy with OOM recovery and automatic fallback to CPU
125
+
-**Edge detection** uses CuPy with dtype preservation to maintain uint16 microscopy data
133
126
-**Deep learning segmentation** integrates PyTorch models via the `@torch` decorator
134
127
135
-
The stack utilities (`stack_slices`, `unstack_slices`) enable efficient 3D volume processing where 2D slices are stacked to GPU, processed in parallel, and unstacked back to CPU. This pattern is used throughout OpenHCS for processing microscopy Z-stacks.
136
-
137
128
The thread-local stream management is critical for high-throughput screening where thousands of images must be processed per experiment. Multiple worker threads can process different images on the same GPU without coordination overhead.
0 commit comments