Support sandboxes in collections by josephjclark · Pull Request #4613 · OpenFn/lightning

josephjclark · 2026-04-10T16:40:04Z

Description

Sandboxes can't support projects that use collections because collection names are globally unique, and the v1 collections API has no way to disambiguate between a parent and its sandbox. This PR relaxes the uniqueness constraint to per-project and adds a v2 API that resolves collections by (project_id, name).

On the lifecycle side, empty collection records are now cloned into a sandbox when it's provisioned so that collections.get('my-collection') resolves correctly without copying production data. When a sandbox is merged back into its parent, collection names are synchronised (new collections in the sandbox are created empty in the parent, collections missing from the sandbox are removed from the parent). Collection data is never synced.

V1 continues to work for any project that doesn't have name collisions. When a v1 lookup finds the same name across multiple projects, it returns 409 Conflict pointing clients at v2. API version is selected via the x-api-version header (missing or "1" = v1, "2" = v2, anything else = 400 Bad Request). Header-based versioning was chosen in discussion with @stuartc and @josephjclark to keep a single collections base path.

Closes #3548

Validation steps

Setup: create a project with a collection

Create a new project called Cat Shelter Registry
Navigate to Project Settings > Collections and create a collection called cats

Create a workflow called "Fetch Cat Facts" with:

A cron trigger (any schedule is fine, we will run it manually)

A single job using @openfn/language-http@latest with this code:

get('https://catfact.ninja/facts?limit=10');

each('$.data.data[*]', fn(state) => {
  const fact = state.data;
  const id = fact.fact.substring(0, 20).replace(/\s/g, '-').toLowerCase();
  return collections.set('cats', `fact-${id}`, {
    fact: fact.fact,
    length: fact.length
  })(state);
});

Run the workflow manually by clicking Run on the workflow page
Navigate to Project Settings > Collections. You should see: cats listed with a storage size greater than 0
Click the download button next to the cats collection. Open the downloaded JSON file. You should see 10 items, each with a fact and length field.

This is your parent project with production data. Keep it open.

1. Sandbox provision clones empty collections

From the Cat Shelter Registry project, navigate to Sandboxes
Click Create Sandbox and name it "sandbox-test"
Once created, click into the sandbox to open it
Inside the sandbox, navigate to Project Settings > Collections

Verify: The cats collection is listed. Click the download button. Open the JSON file -- it should be an empty array []. The sandbox has the collection, but none of the parent's 10 cat facts were copied.

2. Collection data is never merged back

Inside the sandbox from step 1, open the "Fetch Cat Facts" workflow

Edit the job to fetch breed data instead:

get('https://catfact.ninja/breeds?limit=5');

each('$.data.data[*]', fn(state) => {
  const breed = state.data;
  return collections.set('cats', `breed-${breed.breed}`, {
    breed: breed.breed,
    country: breed.country,
    coat: breed.coat
  })(state);
});

Run the workflow manually
Navigate to Project Settings > Collections inside the sandbox. Click download on cats. Open the JSON -- you should see 5 breed items.
Now go back to Sandboxes and merge the sandbox into the parent
After the merge, open the parent project (Cat Shelter Registry)
Navigate to Project Settings > Collections. Click download on cats.

Verify: Open the JSON. You should still see the original 10 cat fact items (with fact and length fields). None of the sandbox's 5 breed items (with breed, country, coat fields) should be present. Collection data is never merged.

3. New collections in sandbox are created in parent on merge

Create a new sandbox from Cat Shelter Registry called "sandbox-breeds"
Open the sandbox and navigate to Project Settings > Collections
Create a new collection called cat-breeds

Open the workflow and edit the job:

get('https://catfact.ninja/breeds?limit=10');

each('$.data.data[*]', fn(state) => {
  const breed = state.data;
  return collections.set('cat-breeds', breed.breed, {
    country: breed.country,
    origin: breed.origin,
    coat: breed.coat
  })(state);
});

Run the workflow. Go to Project Settings > Collections and download cat-breeds -- you should see 10 breed items.
Go back to Sandboxes and merge this sandbox into the parent
Open the parent project and navigate to Project Settings > Collections

Verify: You should now see two collections listed: cats and cat-breeds. Click download on cat-breeds and open the JSON. It should be an empty array []. The collection was created in the parent, but the sandbox's breed data was not copied over.

4. Collections deleted in sandbox are deleted from parent on merge

In the parent project, go to Project Settings > Collections and create a collection called temp-data
Create a new sandbox called "sandbox-cleanup"
Open the sandbox and navigate to Project Settings > Collections. You should see both cats and temp-data listed.
Delete the temp-data collection from the sandbox
Confirm only cats remains in the sandbox's collections list
Go back to Sandboxes and merge this sandbox into the parent
Open the parent project and navigate to Project Settings > Collections

Verify: temp-data is no longer listed. Only cats (and cat-breeds if you did step 3) remain. The collection and all its items were removed from the parent because it was deleted in the sandbox.

API test setup

Steps 5-7 use curl to test the API directly. First, get a token and project ID:

In Lightning, click your profile icon (top right) and go to Tokens
Click Generate New Token, give it a name, and copy the token value
In your terminal, save it for the curl commands:
```
export TOKEN="paste-your-token-here"
```
To get the project ID, open the Cat Shelter Registry project in Lightning. The URL will look like http://localhost:4000/projects/abc123-def456-.../w. The UUID after /projects/ is your project ID:
```
export PROJECT_ID="abc123-def456-..."
```

5. V1 conflict detection (API)

This requires an unmerged sandbox so that cats exists in two projects. If you merged all sandboxes in previous steps, create a new one first.

# V1 lookup -- the name "cats" exists in multiple projects
curl -s -H "Authorization: Bearer $TOKEN" \
     http://localhost:4000/collections/cats | jq .

Verify: The response is 409 Conflict:

{
  "error": "Multiple collections found with this name. Use API v2 with a project_id."
}

Now use v2 with the project ID to resolve it:

curl -s -H "Authorization: Bearer $TOKEN" \
     -H "x-api-version: 2" \
     http://localhost:4000/collections/$PROJECT_ID/cats | jq .

Verify: The response is 200 with the parent's cat fact items.

6. Version header validation (API)

curl -s -H "Authorization: Bearer $TOKEN" \
     -H "x-api-version: 99" \
     http://localhost:4000/collections/cats | jq .

Verify: The response is 400 Bad Request:

{
  "error": "Unsupported API version: \"99\". Supported versions: 1, 2."
}

7. V1 backwards compatibility

Delete all sandboxes first (so cats only exists in one project), or use a project with a uniquely named collection.

curl -s -H "Authorization: Bearer $TOKEN" \
     http://localhost:4000/collections/cats | jq .

Verify: The response is 200 with the collection items, exactly as it worked before this change. No x-api-version header is needed when the collection name is unambiguous.

Additional notes for the reviewer

AI Usage

I have used Claude Code
I have used another model
I have not used AI

Pre-submission checklist

I have performed an AI review of my code (we recommend using /review with Claude Code)
I have implemented and tested all related authorization policies.
I have updated the changelog.
I have ticked a box in "AI usage" in this PR

josephjclark · 2026-04-13T09:46:53Z

Testing and QA notes:

The existing worker & collections adaptor should continue to work against the new lightning server, no changes needed
If you create two collections with the same id, and request that id through the legacy API, you should get an error
After updating the adaptor and worker, this branch should return the correct collection (even for duplicate collection names)

josephjclark · 2026-04-13T13:32:41Z

@elias-ba one thing I didn't think to check is access control. A project should not be able to access the collections of another project.

How can we enforce this?

Using the CLI you can send any access token you want and so long as the project is within that access token's scope, you're good.

In the worker, we send a JWT which I believe is scoped only to access for that workflow.

That must be sufficient right? We just look at the scope of the access token?

codecov · 2026-04-13T16:28:40Z

Codecov Report

❌ Patch coverage is 98.98990% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 89.63%. Comparing base (46d36bc) to head (d1fa4e3).

Files with missing lines	Patch %	Lines
...ightning_web/controllers/collections_controller.ex	96.29%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4613      +/-   ##
==========================================
+ Coverage   89.56%   89.63%   +0.06%     
==========================================
  Files         444      445       +1     
  Lines       21505    21570      +65     
==========================================
+ Hits        19262    19335      +73     
+ Misses       2243     2235       -8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

elias-ba · 2026-04-13T16:30:59Z

Good question @josephjclark , and yes, the access token scope is sufficient. The controller already enforces it.

Every action calls authorize/2 which hits Lightning.Policies.Collections.authorize/3. That policy has two clauses:

For a user subject, it delegates to ProjectUsers.authorize(:access_project, user, ...) which checks membership against the collection's project
For a run subject, it directly compares Runs.get_project_id_for_run(run) == collection.project_id

So a CLI user with a PAT can only touch collections in projects they're a member of, and a worker's run token can only touch collections in the same project as the run it was issued for. There's a test that specifically covers the worker case ("with a run token, cannot access a different project's collection" returns 401) and another for the user case ("returns 401 when user does not have access to the project").

One subtle thing worth noting: v2 makes this tighter by design. With v1, a request with a stale or leaked token could at least look up a collection by name globally; v2 requires you to know (and be authorized for) the project_id upfront.

elias-ba

Hey @josephjclark this is now looking good to me and it has all things we discussed this morning. Please do test / QA this extensively and let me know if you find anything weird

github-actions · 2026-04-13T16:54:09Z

Security Review

⚠️ Automated security review did not complete.

Claude hit the max-turns limit or encountered an error before posting findings.
A manual review of S0 (project-scoped data access), S1 (authorization policies),
and S2 (audit trail coverage) is recommended for this PR.

See the workflow run for details.

midigofrank

@elias-ba this looks great, I'm a little worried with the approach to remove the routes from the router. One thing that worries me is that the collections_router plug hijacks all actions and manually calls the controller, in my mind I'm expecting it handle the x-api-version and backward compatibility only. I don't like that we're losing visibility of the routes from the router.ex

I don't know the exact discussion you had with @stuartc , so I have added him as a reviewer as he has more context. Everything else looks perfect.

Replaces the global unique index on collections.name with a per-project (project_id, name) index, allowing sandboxes to share collection names with their parent. Updates the schema constraint to match and adds conflict detection to get_collection/1, plus a new get_collection/2 that takes a project_id for unambiguous lookup.

Adds a :conflict clause to FallbackController so that when get_collection/1 finds the same collection name in multiple projects, all actions return a 409 with a message directing clients to use the v2 API.

Fixes the create_collection test to match the new constraint name (collections_project_id_name_index). Adds tests for get_collection/1 returning :conflict when the same name exists in multiple projects, and for get_collection/2 resolving unambiguously by project_id. Adds controller tests asserting all v1 endpoints return 409 when a name conflict exists.

Replaces the explicit collections routes with a single wildcard that dispatches to v1 or v2 logic based on the X-API-VERSION header. V2 routes include project_id in the path, resolving collections unambiguously without name-collision risk. Also migrates the download endpoint to v2 (/:project_id/:name) and updates the LiveView download link accordingly.

Covers all v2 endpoints (GET item, stream, PUT, POST, DELETE item, DELETE all), including access control with personal and run tokens, and the key scenario where v2 resolves correctly by project_id when the same collection name exists in multiple projects.

Adds clone_collections_from_parent/2 to the sandbox provisioning pipeline. When a sandbox is forked, empty collection records matching each parent collection name are created in the new project. Items are not copied. Uses on_conflict: :nothing so re-provisioning is safe.

Adds sync_collections/2, called after a successful merge. Collections present in the sandbox but not the parent are created (empty) in the parent. Collections present in the parent but not the sandbox are deleted from the parent along with all their items. Collections in both are left untouched. Data is never merged.

Extends the merge confirmation warning to explain that collection names will be synchronized: new sandbox collections are created empty in the target, and collections deleted in the sandbox are permanently removed from the target including all their data.

The single dispatch function had a cyclomatic complexity of 14 (max 9). Splitting by version brings each function to 7 branches.

Address review feedback on the sandbox collections work: - Add a plug that validates the x-api-version header early and returns 400 for unsupported values instead of silently falling back to v1 - Consolidate v1/v2 controller actions through a shared resolve/1 helper, dropping ~100 lines of duplication - Return 422 with a clear error on missing value/items in request body (previously raised FunctionClauseError) - Move sync_collections into the Sandboxes context, wrap creates and deletes in a single transaction, and replace the N delete_collection calls with a single batch delete_all - Scope the collection unique constraint error to :name instead of :project_id so the user-facing field reports the conflict - Tone down the merge warning copy while keeping the essential info Adds tests for unsupported/garbage x-api-version values, malformed request bodies, and transaction rollback behavior for sync_collections.

- Log a warning when collection sync fails after merge instead of silently ignoring the error - Remove on_conflict: :nothing from insert_empty_collections since both callers guarantee no duplicates can exist - Rename misleading test to match what it actually verifies

- Use project-scoped get_collection/2 in CollectionsComponent to prevent MatchError when collection names collide across projects - Guard migration rollback with duplicate name check so it fails with a clear message instead of a cryptic Postgres error - Make sync_collections failure fail the merge instead of silently logging, and skip the GitHub commit when collections fail to sync - Bind MapSet.difference outside the comprehension in sync_collections - Add v2 write-path test with run token to cover the primary worker path

Introduces LightningWeb.Plugs.VersionedRouter, a reusable behaviour for header-based API versioning. Instead of `match :*, "/*path"` in the Phoenix router, the collections scope now uses `forward` to a router plug that resolves the version and delegates to V1Routes or V2Routes modules with explicit pattern matching per version. The controller keeps only business logic; routing and param assembly live in the version-specific route modules. The VersionedRouter plug handles fallback controller integration so actions can still return error tuples.

Replaces the 4-file VersionedRouter macro abstraction with a single plug at plugs/collections_router.ex. Inlines the ApiVersion logic so version resolution, routing, and fallback handling all live in one place. The router uses forward instead of match :*.

The controller actions return error tuples from with chains (handled by the fallback controller), but their specs claimed Plug.Conn.t() only. This caused a pattern_match_cov warning in the CollectionsRouter plug. Broaden the return types to Plug.Conn.t() | term().

The full merge sequence (workflow import + collection sync) now lives in the Sandboxes context instead of the LiveView. Any caller gets collection sync automatically, not just the UI.

github-project-automation bot added this to Core Apr 10, 2026

github-project-automation bot moved this to New Issues in Core Apr 10, 2026

josephjclark commented Apr 13, 2026

View reviewed changes

Comment thread lib/lightning_web/controllers/collections_controller.ex Outdated

elias-ba reviewed Apr 13, 2026

View reviewed changes

Comment thread lib/lightning_web/controllers/collections_controller.ex Outdated

Comment thread lib/lightning_web/controllers/collections_controller.ex Outdated

Comment thread lib/lightning_web/live/sandbox_live/index.ex Outdated

elias-ba marked this pull request as ready for review April 13, 2026 16:50

elias-ba approved these changes Apr 13, 2026

View reviewed changes

elias-ba requested review from brandonjackson and theroinaochieng April 14, 2026 23:56

midigofrank requested a review from stuartc April 15, 2026 06:31

midigofrank requested changes Apr 15, 2026

View reviewed changes

github-project-automation bot moved this from New Issues to In review in Core Apr 15, 2026

midigofrank mentioned this pull request Apr 15, 2026

Preview first record of a collection #4619

Open

7 tasks

josephjclark and others added 13 commits April 15, 2026 11:40

Return 409 when v1 collection lookup finds a name conflict

4f5d3da

Adds a :conflict clause to FallbackController so that when get_collection/1 finds the same collection name in multiple projects, all actions return a 409 with a message directing clients to use the v2 API.

Add v2 collections API tests

b01d53a

Covers all v2 endpoints (GET item, stream, PUT, POST, DELETE item, DELETE all), including access control with personal and run tokens, and the key scenario where v2 resolves correctly by project_id when the same collection name exists in multiple projects.

Split dispatch/2 into dispatch_v1 and dispatch_v2 to satisfy credo

067c749

The single dispatch function had a cyclomatic complexity of 14 (max 9). Splitting by version brings each function to 7 branches.

Update changelog for sandbox collections support

c8e7c58

Test dispatch fallback clauses for unsupported paths and methods

ce0dee9

Test api version plug rejects multiple header values

17aaa1c

elias-ba added 12 commits April 15, 2026 11:40

Use explicit up/down in collections migration with irreversible guard

1687260

Allow migration rollback when no duplicate collection names exist

57434b1

Restore on_conflict: :nothing to guard against concurrent merges

e5c904b

Move merge pipeline into Sandboxes.merge/4

b15c804

The full merge sequence (workflow import + collection sync) now lives in the Sandboxes context instead of the LiveView. Any caller gets collection sync automatically, not just the UI.

Fix alias ordering in Sandboxes module

83a073a

Test that collection sync failure shows flash error on merge

a36bad3

Test Sandboxes.merge/4 including default opts

d1fa4e3

stuartc force-pushed the sandbox-collections branch from 315f949 to d1fa4e3 Compare April 15, 2026 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support sandboxes in collections#4613

Support sandboxes in collections#4613
josephjclark wants to merge 25 commits intomainfrom
sandbox-collections

josephjclark commented Apr 10, 2026 •

edited by elias-ba

Loading

Uh oh!

Uh oh!

josephjclark commented Apr 13, 2026

Uh oh!

josephjclark commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

elias-ba commented Apr 13, 2026 •

edited

Loading

Uh oh!

elias-ba left a comment •

edited

Loading

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

midigofrank left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

josephjclark commented Apr 10, 2026 • edited by elias-ba Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Validation steps

Setup: create a project with a collection

1. Sandbox provision clones empty collections

2. Collection data is never merged back

3. New collections in sandbox are created in parent on merge

4. Collections deleted in sandbox are deleted from parent on merge

API test setup

5. V1 conflict detection (API)

6. Version header validation (API)

7. V1 backwards compatibility

Additional notes for the reviewer

AI Usage

Pre-submission checklist

Uh oh!

Uh oh!

josephjclark commented Apr 13, 2026

Uh oh!

josephjclark commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

elias-ba commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elias-ba left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 13, 2026

Security Review

Uh oh!

midigofrank left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

josephjclark commented Apr 10, 2026 •

edited by elias-ba

Loading

codecov bot commented Apr 13, 2026 •

edited

Loading

elias-ba commented Apr 13, 2026 •

edited

Loading

elias-ba left a comment •

edited

Loading