fix: add shutdown methods to executors by jbusecke · Pull Request #925 · zarr-developers/VirtualiZarr

jbusecke · 2026-03-12T17:28:15Z

@TomNicholas and I have been mulling over a complex native zarr ingestion job for a few days now. We were ingesting many batches of large (~1TB) native zarr stores, and saw a steady increase of memory which indicated that 'something' was holding onto memory in between batches. This PR adds tests to catch this behavior and a fix for the lithops executor that did fix our problem for now.

The dataset we are using is currently not public. I would like to demonstrate the base issue fully reproducibly. If anyone knows a ~1TB/300k chunks native zarr store in an anon bucket, please let me know. -> Moved this concern to https://github.com/jbusecke/virtualizarr-benchmark
Closes Lithops FunctionExecutor memory leaks: atexit handler + unbounded futures list #926 (confirmed in https://github.com/jbusecke/virtualizarr-benchmark)
Tests added
Tests passing
No test coverage regression
Full type hint coverage
Changes are documented in docs/releases.md

Add explicit shutdown() to SerialExecutor and DaskDelayedExecutor that clears tracked futures. Enhance LithopsEagerFunctionExecutor.shutdown() to clear cached _call_output on ResponseFutures before closing, preventing memory accumulation across repeated map() calls. Add parametrized tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codecov · 2026-03-12T17:30:11Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.30%. Comparing base (2b68ec1) to head (887edac).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #925      +/-   ##
==========================================
+ Coverage   89.23%   89.30%   +0.07%     
==========================================
  Files          33       33              
  Lines        2025     2039      +14     
==========================================
+ Hits         1807     1821      +14     
  Misses        218      218

Files with missing lines	Coverage Δ
virtualizarr/parallel.py	`93.02% <100.00%> (+3.36%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

TomNicholas · 2026-03-12T18:49:06Z

+
+
+@pytest.mark.parametrize("executor_cls", ALL_EXECUTORS)
+class TestExecutorMemory:


I'm not sure if either of these tests will be reliable enough - curious of @chuckwondo 's thoughts.

for more information, see https://pre-commit.ci

TomNicholas · 2026-03-12T20:02:41Z

+        # Lithops registers self.clean as an atexit handler (executors.py __init__),
+        # which prevents the FunctionExecutor from ever being garbage collected.
+        # Unregister it so the executor can be freed after shutdown.
+        atexit.unregister(self.lithops_client.clean)


This is absolutely wild and deserves raising upstream

Probably so.

chuckwondo · 2026-03-12T20:08:54Z

+
+
+@pytest.mark.parametrize("executor_cls", ALL_EXECUTORS)
+class TestExecutorMemory:


I'm not sure these really belong here. They seem like tests that should occur upstream. If we see memory leaks in the upstream executors, we should probably be opening bugs against the appropriate repositories, no?

I agree, see #926 for discussion of what we should do or not do to clean up

I think you are right in principal, but I would propose to keep this around as at least an optional test due to the significant work that was needed to get to the bottom of this.

Yeah @chuckwondo I see these tests as hopefully-temporary, but unfortunately important.

for more information, see https://pre-commit.ci

jbusecke · 2026-03-12T23:04:49Z

Ok Tom and I actually worked on an alternative approach where we change the lithops config to set lithops.data_cleaner to false (this is true by default and triggers the atexit registration). Combined with the added .shutdown() method on the Lithops exec this solves the problem in #926 and seems a bunch nicer than the original approach.

I have limited this to when the backend is localhost so that we leave the serverless behavior untouched for now. We could easily extend this if a user finds this error with other backends.

…s/VirtualiZarr into executor-cleaning

TomNicholas

I think this is good, but before releasing it I want to:

confirm with @jbusecke that nothing else puzzling has come up wrt this rabbit hole,
raise an upstream issue on lithops in case we're missing something important here.

for more information, see https://pre-commit.ci

The shutdown method was not clearing `lithops_client.futures` or freeing output memory, causing test failures on Python 3.12 and 3.13. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Temporarily point lithops dep to jbusecke/lithops@fix-join-job-manager-localhostv2 to verify the upstream fix resolves the memory growth test on Linux CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…to exit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…re measuring Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

for more information, see https://pre-commit.ci

chuckwondo · 2026-03-19T21:48:12Z

        return iter(dask.compute(*delayed_tasks))

+    def shutdown(self, wait: bool = True, *, cancel_futures: bool = False) -> None:
+        self._futures.clear()


Can we just get rid of self._futures? It's not clear why we even hold the list of futures to begin with?

Um that is actually a good question @chuckwondo. @TomNicholas do you know why this was necessary in the case of dask?

Why is this the case for any of the executors?

jbusecke · 2026-03-19T21:49:30Z

Ok that was a frustrating day. The memory test seemed to have some sort of dependency on the running environment (it passed concistently on my mac, but never on the CI). I have confirmed here that the actual end-to-end workflow which triggered this PR in the first place is indeed working as expected with this fix. I guess in the end I have come to agree with @chuckwondo that that behavior should not be tested in this packages context, and I have removed the test alltogether. We are still checking that the futures are all cleared up (and I am asserting that the data_cleaner property is indeed set to False by our config modifications.

Added `.shutdown()` method to custom executors to prevent unbounded memory increase in lithops.

jbusecke and others added 2 commits March 12, 2026 12:26

Add tests and constrain fix only to lithops

4f4c061

jbusecke requested review from TomNicholas and chuckwondo March 12, 2026 17:28

jbusecke temporarily deployed to test-release March 12, 2026 17:28 — with GitHub Actions Inactive

TomNicholas reviewed Mar 12, 2026

View reviewed changes

Comment thread virtualizarr/parallel.py Outdated

Comment thread virtualizarr/tests/test_parallel.py Outdated

TomNicholas added the performance label Mar 12, 2026

TomNicholas reviewed Mar 12, 2026

View reviewed changes

jbusecke and others added 2 commits March 12, 2026 15:35

Clean up claudes horrible tests

d925fb7

[pre-commit.ci] auto fixes from pre-commit.com hooks

46908c1

for more information, see https://pre-commit.ci

pre-commit-ci bot temporarily deployed to test-release March 12, 2026 19:37 Inactive

jbusecke marked this pull request as ready for review March 12, 2026 19:50

TomNicholas reviewed Mar 12, 2026

View reviewed changes

Comment thread virtualizarr/tests/test_parallel.py Outdated

chuckwondo reviewed Mar 12, 2026

View reviewed changes

TomNicholas mentioned this pull request Mar 12, 2026

Lithops FunctionExecutor memory leaks: atexit handler + unbounded futures list #926

Open

jbusecke and others added 2 commits March 12, 2026 18:59

Alternative approach via lithops config

4732cc3

[pre-commit.ci] auto fixes from pre-commit.com hooks

031ffac

for more information, see https://pre-commit.ci

pre-commit-ci bot temporarily deployed to test-release March 12, 2026 23:00 Inactive

jbusecke added 2 commits March 12, 2026 19:07

toms renaming suggestion

6414546

Merge branch 'executor-cleaning' of https://github.com/zarr-developer…

f013d74

…s/VirtualiZarr into executor-cleaning

jbusecke temporarily deployed to test-release March 12, 2026 23:08 — with GitHub Actions Inactive

Merge branch 'main' into executor-cleaning

216fb09

TomNicholas temporarily deployed to test-release March 12, 2026 23:12 — with GitHub Actions Inactive

TomNicholas approved these changes Mar 13, 2026

View reviewed changes

TomNicholas mentioned this pull request Mar 16, 2026

Release v2.5.0 #915

Open

Merge branch 'main' into executor-cleaning

6a83bf3

TomNicholas temporarily deployed to test-release March 16, 2026 17:54 — with GitHub Actions Inactive

Mark Lithops executor tests as flaky

0cb839d

jbusecke temporarily deployed to test-release March 19, 2026 18:47 — with GitHub Actions Inactive

jbusecke and others added 2 commits March 19, 2026 14:55

rerun flaky tests

05316e6

[pre-commit.ci] auto fixes from pre-commit.com hooks

23d2654

for more information, see https://pre-commit.ci

pre-commit-ci bot temporarily deployed to test-release March 19, 2026 18:56 Inactive

Fix LithopsEagerFunctionExecutor.shutdown() not clearing futures

748f50d

The shutdown method was not clearing `lithops_client.futures` or freeing output memory, causing test failures on Python 3.12 and 3.13. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jbusecke temporarily deployed to test-release March 19, 2026 19:23 — with GitHub Actions Inactive

add comment

263656e

jbusecke temporarily deployed to test-release March 19, 2026 19:24 — with GitHub Actions Inactive

Test against lithops fork with job_manager thread join fix

b9ba3ca

Temporarily point lithops dep to jbusecke/lithops@fix-join-job-manager-localhostv2 to verify the upstream fix resolves the memory growth test on Linux CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jbusecke temporarily deployed to test-release March 19, 2026 21:03 — with GitHub Actions Inactive

Add sleep between memory test iterations to allow background threads …

63481f8

…to exit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jbusecke temporarily deployed to test-release March 19, 2026 21:10 — with GitHub Actions Inactive

Move sleep to after the loop to allow background threads to exit befo…

68a60e9

…re measuring Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jbusecke temporarily deployed to test-release March 19, 2026 21:10 — with GitHub Actions Inactive

Increase sleep to 30s to give background threads more time to exit

ba61353

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jbusecke temporarily deployed to test-release March 19, 2026 21:15 — with GitHub Actions Inactive

Revert lithops dep back to released version

71a43b4

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jbusecke temporarily deployed to test-release March 19, 2026 21:24 — with GitHub Actions Inactive

jbusecke and others added 2 commits March 19, 2026 17:44

remove the memory growth test

394c019

[pre-commit.ci] auto fixes from pre-commit.com hooks

d1822a3

for more information, see https://pre-commit.ci

pre-commit-ci bot temporarily deployed to test-release March 19, 2026 21:44 Inactive

chuckwondo reviewed Mar 19, 2026

View reviewed changes

Merge branch 'main' into executor-cleaning

b95baa1

jbusecke temporarily deployed to test-release March 19, 2026 21:50 — with GitHub Actions Inactive

Add .shutdown() method to custom executors

d8c0ab1

Added `.shutdown()` method to custom executors to prevent unbounded memory increase in lithops.

jbusecke temporarily deployed to test-release March 19, 2026 21:52 — with GitHub Actions Inactive

satisfy linter

887edac

jbusecke temporarily deployed to test-release March 19, 2026 21:58 — with GitHub Actions Inactive



		@pytest.mark.parametrize("executor_cls", ALL_EXECUTORS)
		class TestExecutorMemory:

Conversation

jbusecke commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbusecke Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbusecke commented Mar 12, 2026

Uh oh!

TomNicholas left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbusecke commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jbusecke commented Mar 12, 2026 •

edited

Loading

codecov bot commented Mar 12, 2026 •

edited

Loading

jbusecke Mar 12, 2026 •

edited

Loading