Conversation
|
Testing and QA notes:
|
|
@elias-ba one thing I didn't think to check is access control. A project should not be able to access the collections of another project. How can we enforce this? Using the CLI you can send any access token you want and so long as the project is within that access token's scope, you're good. In the worker, we send a JWT which I believe is scoped only to access for that workflow. That must be sufficient right? We just look at the scope of the access token? |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4613 +/- ##
==========================================
+ Coverage 89.56% 89.63% +0.06%
==========================================
Files 444 445 +1
Lines 21505 21570 +65
==========================================
+ Hits 19262 19335 +73
+ Misses 2243 2235 -8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Good question @josephjclark , and yes, the access token scope is sufficient. The controller already enforces it. Every action calls
So a CLI user with a PAT can only touch collections in projects they're a member of, and a worker's run token can only touch collections in the same project as the run it was issued for. There's a test that specifically covers the worker case ( One subtle thing worth noting: v2 makes this tighter by design. With v1, a request with a stale or leaked token could at least look up a collection by name globally; v2 requires you to know (and be authorized for) the project_id upfront. |
There was a problem hiding this comment.
Hey @josephjclark this is now looking good to me and it has all things we discussed this morning. Please do test / QA this extensively and let me know if you find anything weird
Security ReviewClaude hit the max-turns limit or encountered an error before posting findings. See the workflow run for details. |
midigofrank
left a comment
There was a problem hiding this comment.
@elias-ba this looks great, I'm a little worried with the approach to remove the routes from the router. One thing that worries me is that the collections_router plug hijacks all actions and manually calls the controller, in my mind I'm expecting it handle the x-api-version and backward compatibility only. I don't like that we're losing visibility of the routes from the router.ex
I don't know the exact discussion you had with @stuartc , so I have added him as a reviewer as he has more context. Everything else looks perfect.
Replaces the global unique index on collections.name with a per-project (project_id, name) index, allowing sandboxes to share collection names with their parent. Updates the schema constraint to match and adds conflict detection to get_collection/1, plus a new get_collection/2 that takes a project_id for unambiguous lookup.
Adds a :conflict clause to FallbackController so that when get_collection/1 finds the same collection name in multiple projects, all actions return a 409 with a message directing clients to use the v2 API.
Fixes the create_collection test to match the new constraint name (collections_project_id_name_index). Adds tests for get_collection/1 returning :conflict when the same name exists in multiple projects, and for get_collection/2 resolving unambiguously by project_id. Adds controller tests asserting all v1 endpoints return 409 when a name conflict exists.
Replaces the explicit collections routes with a single wildcard that dispatches to v1 or v2 logic based on the X-API-VERSION header. V2 routes include project_id in the path, resolving collections unambiguously without name-collision risk. Also migrates the download endpoint to v2 (/:project_id/:name) and updates the LiveView download link accordingly.
Covers all v2 endpoints (GET item, stream, PUT, POST, DELETE item, DELETE all), including access control with personal and run tokens, and the key scenario where v2 resolves correctly by project_id when the same collection name exists in multiple projects.
Adds clone_collections_from_parent/2 to the sandbox provisioning pipeline. When a sandbox is forked, empty collection records matching each parent collection name are created in the new project. Items are not copied. Uses on_conflict: :nothing so re-provisioning is safe.
Adds sync_collections/2, called after a successful merge. Collections present in the sandbox but not the parent are created (empty) in the parent. Collections present in the parent but not the sandbox are deleted from the parent along with all their items. Collections in both are left untouched. Data is never merged.
Extends the merge confirmation warning to explain that collection names will be synchronized: new sandbox collections are created empty in the target, and collections deleted in the sandbox are permanently removed from the target including all their data.
The single dispatch function had a cyclomatic complexity of 14 (max 9). Splitting by version brings each function to 7 branches.
Address review feedback on the sandbox collections work: - Add a plug that validates the x-api-version header early and returns 400 for unsupported values instead of silently falling back to v1 - Consolidate v1/v2 controller actions through a shared resolve/1 helper, dropping ~100 lines of duplication - Return 422 with a clear error on missing value/items in request body (previously raised FunctionClauseError) - Move sync_collections into the Sandboxes context, wrap creates and deletes in a single transaction, and replace the N delete_collection calls with a single batch delete_all - Scope the collection unique constraint error to :name instead of :project_id so the user-facing field reports the conflict - Tone down the merge warning copy while keeping the essential info Adds tests for unsupported/garbage x-api-version values, malformed request bodies, and transaction rollback behavior for sync_collections.
- Log a warning when collection sync fails after merge instead of silently ignoring the error - Remove on_conflict: :nothing from insert_empty_collections since both callers guarantee no duplicates can exist - Rename misleading test to match what it actually verifies
- Use project-scoped get_collection/2 in CollectionsComponent to prevent MatchError when collection names collide across projects - Guard migration rollback with duplicate name check so it fails with a clear message instead of a cryptic Postgres error - Make sync_collections failure fail the merge instead of silently logging, and skip the GitHub commit when collections fail to sync - Bind MapSet.difference outside the comprehension in sync_collections - Add v2 write-path test with run token to cover the primary worker path
Introduces LightningWeb.Plugs.VersionedRouter, a reusable behaviour for header-based API versioning. Instead of `match :*, "/*path"` in the Phoenix router, the collections scope now uses `forward` to a router plug that resolves the version and delegates to V1Routes or V2Routes modules with explicit pattern matching per version. The controller keeps only business logic; routing and param assembly live in the version-specific route modules. The VersionedRouter plug handles fallback controller integration so actions can still return error tuples.
Replaces the 4-file VersionedRouter macro abstraction with a single plug at plugs/collections_router.ex. Inlines the ApiVersion logic so version resolution, routing, and fallback handling all live in one place. The router uses forward instead of match :*.
The controller actions return error tuples from with chains (handled by the fallback controller), but their specs claimed Plug.Conn.t() only. This caused a pattern_match_cov warning in the CollectionsRouter plug. Broaden the return types to Plug.Conn.t() | term().
The full merge sequence (workflow import + collection sync) now lives in the Sandboxes context instead of the LiveView. Any caller gets collection sync automatically, not just the UI.
315f949 to
d1fa4e3
Compare
Description
Sandboxes can't support projects that use collections because collection names are globally unique, and the v1 collections API has no way to disambiguate between a parent and its sandbox. This PR relaxes the uniqueness constraint to per-project and adds a v2 API that resolves collections by
(project_id, name).On the lifecycle side, empty collection records are now cloned into a sandbox when it's provisioned so that
collections.get('my-collection')resolves correctly without copying production data. When a sandbox is merged back into its parent, collection names are synchronised (new collections in the sandbox are created empty in the parent, collections missing from the sandbox are removed from the parent). Collection data is never synced.V1 continues to work for any project that doesn't have name collisions. When a v1 lookup finds the same name across multiple projects, it returns 409 Conflict pointing clients at v2. API version is selected via the
x-api-versionheader (missing or"1"= v1,"2"= v2, anything else = 400 Bad Request). Header-based versioning was chosen in discussion with @stuartc and @josephjclark to keep a single collections base path.Closes #3548
Validation steps
Setup: create a project with a collection
cats@openfn/language-http@latestwith this code:catslisted with a storage size greater than 0catscollection. Open the downloaded JSON file. You should see 10 items, each with afactandlengthfield.This is your parent project with production data. Keep it open.
1. Sandbox provision clones empty collections
Verify: The
catscollection is listed. Click the download button. Open the JSON file -- it should be an empty array[]. The sandbox has the collection, but none of the parent's 10 cat facts were copied.2. Collection data is never merged back
cats. Open the JSON -- you should see 5 breed items.cats.Verify: Open the JSON. You should still see the original 10 cat fact items (with
factandlengthfields). None of the sandbox's 5 breed items (withbreed,country,coatfields) should be present. Collection data is never merged.3. New collections in sandbox are created in parent on merge
cat-breedscat-breeds-- you should see 10 breed items.Verify: You should now see two collections listed:
catsandcat-breeds. Click download oncat-breedsand open the JSON. It should be an empty array[]. The collection was created in the parent, but the sandbox's breed data was not copied over.4. Collections deleted in sandbox are deleted from parent on merge
temp-datacatsandtemp-datalisted.temp-datacollection from the sandboxcatsremains in the sandbox's collections listVerify:
temp-datais no longer listed. Onlycats(andcat-breedsif you did step 3) remain. The collection and all its items were removed from the parent because it was deleted in the sandbox.API test setup
Steps 5-7 use curl to test the API directly. First, get a token and project ID:
http://localhost:4000/projects/abc123-def456-.../w. The UUID after/projects/is your project ID:5. V1 conflict detection (API)
This requires an unmerged sandbox so that
catsexists in two projects. If you merged all sandboxes in previous steps, create a new one first.Verify: The response is 409 Conflict:
{ "error": "Multiple collections found with this name. Use API v2 with a project_id." }Now use v2 with the project ID to resolve it:
Verify: The response is 200 with the parent's cat fact items.
6. Version header validation (API)
Verify: The response is 400 Bad Request:
{ "error": "Unsupported API version: \"99\". Supported versions: 1, 2." }7. V1 backwards compatibility
Delete all sandboxes first (so
catsonly exists in one project), or use a project with a uniquely named collection.Verify: The response is 200 with the collection items, exactly as it worked before this change. No
x-api-versionheader is needed when the collection name is unambiguous.Additional notes for the reviewer
AI Usage
Pre-submission checklist
/reviewwith Claude Code)