This document describes the HTTP API surface of the CAVE authorization system (middle_auth) and what a replacement service must implement to be a drop-in substitute for CAVE services that depend on it.
CAVE authorization is cleanly separated into two components:
- middle_auth — The Flask server providing OAuth login, token management, user/group/permission CRUD, and token validation endpoints.
- middle_auth_client — A Python library installed in every CAVE service, providing Flask decorators (
@auth_required,@auth_requires_permission, etc.) that make HTTP callbacks to the middle_auth server on every request.
Every CAVE service delegates auth entirely through middle_auth_client. No service implements its own token validation or permission logic. The decorators call back to middle_auth, cache the result (TTL 300s by default), and gate access based on the response.
Services locate middle_auth via environment variables set in Kubernetes deployment config:
AUTH_URL— Base URL of the auth server (e.g.,cave.example.org/auth). Used bymiddle_auth_clientdecorators and by PyChunkedGraph's direct HTTP calls.STICKY_AUTH_URL— URL for OAuth browser redirects (may differ for sticky session routing). Used by AnnotationFrameworkInfoService.
Pointing CAVE services at a replacement is a deployment config change — update these env vars.
Tokens reach CAVE services via three mechanisms (checked in this order by DatasetGateway's TokenAuthentication):
- HTTP cookie:
dsg_token - Authorization header:
Bearer {token} - Query parameter:
?dsg_token=...
Note: The original CAVE
middle_auth_clientusesmiddle_auth_tokenas the cookie and query param name. DatasetGateway unifies todsg_token. CAVE services usingmiddle_auth_clientwithBearerheader auth are unaffected; those relying on cookie or query param names need the updated client or config.
A replacement must implement approximately 10 endpoints that CAVE services actually call. The remaining ~40 management endpoints in middle_auth are only used by its own admin UI and can use whatever API design fits the new platform.
Called on every authenticated request across all CAVE services via middle_auth_client decorators.
The single most important endpoint. Every @auth_required and @auth_requires_permission decorator calls this to validate the token and retrieve the user's permissions.
Request:
- Header:
Authorization: Bearer {token}
Response (200):
{
"id": 42,
"parent_id": null,
"service_account": false,
"name": "username",
"email": "user@example.org",
"admin": false,
"pi": "",
"affiliations": [],
"groups": ["group1", "group2"],
"groups_admin": [],
"permissions": {
"fish2": 2,
"fanc": 1
},
"permissions_v2": {
"fish2": ["view", "edit"],
"fanc": ["view"]
},
"permissions_v2_ignore_tos": {
"fish2": ["view", "edit"]
},
"missing_tos": [],
"datasets_admin": ["fish2"]
}Field notes:
parent_id/service_account— non-null when the token belongs to a service account (child of a human user)permissions— legacy v1 format mapping dataset name to a numeric level (0=none, 1=view, 2=edit); the max level across all permissions for that datasetpermissions_v2— permissions filtered by TOS acceptancepermissions_v2_ignore_tos— permissions regardless of TOS acceptancemissing_tos— list of{dataset_id, dataset_name, tos_id, tos_name}for datasets where the user has permissions but hasn't accepted the required TOSgroups_admin— groups where the user has the admin roleaffiliations— currently always[](not yet implemented)
Response (401): Token invalid or expired.
Caching: middle_auth_client caches responses client-side for 300 seconds (configurable via TOKEN_CACHE_TTL env var, LRU size via TOKEN_CACHE_MAXSIZE, default 1024).
Consumers: Every CAVE service with auth decorators — AnnotationEngine, MaterializationEngine, PyChunkedGraph, SkeletonService, PCGL2Cache, NeuroglancerJsonServer, AnnotationFrameworkInfoService, ProofreadingProgress, guidebook, dash_on_flask.
Maps a service table to a dataset name. Called by @auth_requires_permission when a service needs to resolve which dataset a table belongs to for permission checking.
Request:
- Header:
Authorization: Bearer {token} - Path params:
namespace(e.g.,"aligned_volume","datastack"),table_id(string)
Response (200):
"fish2"Consumers: Services using @auth_requires_permission with resource_namespace — MaterializationEngine (~30 endpoints), AnnotationEngine, PyChunkedGraph, SkeletonService, PCGL2Cache, AnnotationFrameworkInfoService, guidebook, dash_on_flask.
Called by users_share_common_group() to check whether two users share a common group.
Request:
- Header:
Authorization: Bearer {service_token} - Path param:
user_id(integer)
Response (200): User object with group membership.
Consumers: AnnotationEngine and MaterializationEngine only.
Returns display names for a list of user IDs.
Request:
- Header:
Authorization: Bearer {token} - Query param: comma-separated user IDs
Response (200):
[
{"id": 42, "name": "alice"},
{"id": 43, "name": "bob"}
]Consumers: PyChunkedGraph only (direct HTTP call in get_username_dict()).
Returns full user info for a list of user IDs.
Request:
- Header:
Authorization: Bearer {token} - Query param: comma-separated user IDs
Response (200): Array of user objects.
Consumers: PyChunkedGraph only (direct HTTP call in get_userinfo_dict()).
Called when checking whether unauthenticated users can access specific data.
Check if a table has any public entries.
Request: Header: Authorization: Bearer {token}
Response (200): Boolean.
Cached: 300 seconds in middle_auth_client.
Check if a specific root is public.
Request: Header: Authorization: Bearer {token}
Response (200): Boolean.
Cached: 300 seconds in middle_auth_client.
Batch check which roots are public.
Request:
- Header:
Authorization: Bearer {token},Content-Type: application/json - Body: JSON array of root IDs
Response (200): Boolean or list of booleans.
Cached: 300 seconds in middle_auth_client.
Required for user login via browser.
Initiates Google OAuth flow. Returns authorization URL (for programmatic clients via X-Requested-With header) or redirects browser.
Query params: redirect (return URL), tos_id (optional ToS to accept).
Google OAuth callback. Exchanges code for token, creates/updates user, sets dsg_token cookie (7-day TTL), redirects to original URL.
Invalidates token and clears cookie.
Consumers: AnnotationFrameworkInfoService redirects users to STICKY_AUTH_URL + /api/v1/logout.
Called by the CAVEclient Python library for programmatic token management.
Generate a new API token for the authenticated user.
Request: Header: Authorization: Bearer {token}
Response (200): New token string.
List all tokens for the current user.
Deprecated but still referenced in CAVEclient.
The scim branch of middle_auth implements SCIM 2.0 (System for Cross-domain Identity Management) endpoints for machine-to-machine provisioning of users, groups, and datasets.
SCIM 2.0 (RFC 7643 / RFC 7644) provides a standardized REST API for identity provisioning. These endpoints enable external identity providers (e.g., Okta, Azure AD) to automatically manage users, groups, and dataset assignments without using the admin UI.
/{URL_PREFIX}/scim/v2
Default: /auth/scim/v2
All SCIM endpoints require a Bearer token with super admin privileges, enforced by scim_auth_required:
Authorization: Bearer {admin_token}
Returns the SCIM service provider configuration, including supported features (patch, bulk, filter, sort, changePassword, etag).
Returns the list of supported resource types: User, Group, and Dataset (custom).
Returns full JSON schemas for all supported resource types.
| Method | Path | Description |
|---|---|---|
GET |
/auth/scim/v2/Users |
List users (with filtering and pagination) |
GET |
/auth/scim/v2/Users/{scim_id} |
Get a single user |
POST |
/auth/scim/v2/Users |
Create a new user |
PUT |
/auth/scim/v2/Users/{scim_id} |
Replace a user (full update) |
PATCH |
/auth/scim/v2/Users/{scim_id} |
Partially update a user |
DELETE |
/auth/scim/v2/Users/{scim_id} |
Deactivate a user |
Schema URNs:
urn:ietf:params:scim:schemas:core:2.0:Userurn:ietf:params:scim:schemas:extension:neuroglancer:2.0:User(custom extension withadmin,pi,gdprConsent,serviceAccountfields)
Key field mappings:
userName→emaildisplayName/name.formatted→nameactive→is_active(deactivation only; no hard delete)externalId→external_idid→scim_id(UUID5 deterministic from internal ID)
| Method | Path | Description |
|---|---|---|
GET |
/auth/scim/v2/Groups |
List groups (with filtering and pagination) |
GET |
/auth/scim/v2/Groups/{scim_id} |
Get a single group (includes members) |
POST |
/auth/scim/v2/Groups |
Create a new group |
PUT |
/auth/scim/v2/Groups/{scim_id} |
Replace a group (full update) |
PATCH |
/auth/scim/v2/Groups/{scim_id} |
Partially update a group (add/remove members) |
DELETE |
/auth/scim/v2/Groups/{scim_id} |
Delete a group |
Schema URN: urn:ietf:params:scim:schemas:core:2.0:Group
Key field mappings:
displayName→namemembers→UserGroupM:M relationship (each member hasvalue= user SCIM ID anddisplay= user name)
| Method | Path | Description |
|---|---|---|
GET |
/auth/scim/v2/Datasets |
List datasets (with filtering and pagination) |
GET |
/auth/scim/v2/Datasets/{scim_id} |
Get a single dataset |
POST |
/auth/scim/v2/Datasets |
Create a new dataset |
PUT |
/auth/scim/v2/Datasets/{scim_id} |
Replace a dataset (full update) |
PATCH |
/auth/scim/v2/Datasets/{scim_id} |
Partially update a dataset |
DELETE |
/auth/scim/v2/Datasets/{scim_id} |
Delete a dataset |
Schema URN: urn:ietf:params:scim:schemas:neuroglancer:2.0:Dataset (custom)
Key field mappings:
name→name(dataset slug)tosId→tos_id(FK to TOS document)serviceTables→ associatedServiceTablerecords (service name + table name pairs)
SCIM IDs are deterministic UUID5 values generated from internal integer IDs:
uuid5(SCIM_NAMESPACE, f"{resource_type}:{internal_id}")Where SCIM_NAMESPACE = uuid.UUID("6ba7b810-9dad-11d1-80b4-00c04fd430c8").
Each resource also supports an externalId field for mapping to the identity provider's internal ID. Lookup priority: externalId first, then scim_id.
Supports RFC 7644 filter expressions on list endpoints via the filter query parameter:
GET /auth/scim/v2/Users?filter=userName eq "user@example.org"
GET /auth/scim/v2/Groups?filter=displayName co "admin"
Supported operators: eq, ne, co (contains), sw (starts with), ew (ends with), pr (present), gt, ge, lt, le. Logical operators and, or, and not are supported.
SCIM uses 1-based pagination with startIndex and count parameters:
GET /auth/scim/v2/Users?startIndex=1&count=50
startIndexdefaults to 1 (1-based index)countdefaults to 100, max 1000count=0returns onlytotalResults(per RFC 7644 §3.4.2.4)
Response format:
{
"schemas": ["urn:ietf:params:scim:api:messages:2.0:ListResponse"],
"totalResults": 150,
"itemsPerPage": 50,
"startIndex": 1,
"Resources": [...]
}SCIM errors use application/scim+json content type:
{
"schemas": ["urn:ietf:params:scim:api:messages:2.0:Error"],
"status": "404",
"scimType": "invalidValue",
"detail": "Resource not found"
}The SCIM implementation adds the following fields to existing models:
| Model | Field | Type | Description |
|---|---|---|---|
| User | scim_id |
String(36), unique, indexed | UUID5 SCIM identifier |
| User | external_id |
String(255), unique, indexed | External system identifier |
| Group | scim_id |
String(36), unique, indexed | UUID5 SCIM identifier |
| Group | external_id |
String(255), unique, indexed | External system identifier |
| Dataset | scim_id |
String(36), unique, indexed | UUID5 SCIM identifier |
| Dataset | external_id |
String(255), unique, indexed | External system identifier |
The following endpoint groups are only called by middle_auth's own admin UI. A replacement can implement equivalent functionality with any API design:
- Group CRUD —
POST/GET/PUT/DELETE /api/v1/group/... - Dataset CRUD —
POST/GET/PUT/DELETE /api/v1/dataset/... - Permission CRUD —
POST/GET/PUT /api/v1/permission/... - Service account management —
POST/GET/PUT/DELETE /api/v1/service_account/... - Terms of Service management —
POST/GET/PUT /api/v1/tos/... - Service-table-dataset mapping management —
POST/DELETE /api/v1/service/{service}/table/{table}/dataset/{dataset} - Redis debugging —
GET /api/v1/redis/... - User admin operations —
POST/PUT/DELETE /api/v1/user/...(the admin CRUD, not/user/cachewhich is critical)
These are the middle_auth_client decorators used by CAVE services. A replacement that keeps the middle_auth_client library (or a fork) needs to satisfy the HTTP calls these decorators make.
| Decorator | HTTP Calls | Used By |
|---|---|---|
@auth_required |
GET /api/v1/user/cache |
All 10 services |
@auth_requires_permission(perm, table_arg, resource_namespace) |
GET /api/v1/user/cache + GET /api/v1/service/{ns}/table/{id}/dataset |
MaterializationEngine (30+), PyChunkedGraph, SkeletonService, AnnotationEngine, PCGL2Cache, AnnotationFrameworkInfoService, guidebook, dash_on_flask |
@auth_requires_admin |
GET /api/v1/user/cache (checks admin field) |
MaterializationEngine, AnnotationEngine, AnnotationFrameworkInfoService, NeuroglancerJsonServer |
@auth_requires_dataset_admin |
GET /api/v1/user/cache + GET /api/v1/service/{ns}/table/{id}/dataset (checks datasets_admin) |
MaterializationEngine (2 endpoints) |
@auth_requires_group(group) |
GET /api/v1/user/cache (checks groups field) |
Not currently used by any service |
Implement the ~10 endpoints above with the same request/response contract. The middle_auth_client decorators don't care what's behind the URLs. This is the least disruptive path — no CAVE service code changes required, only deployment config (AUTH_URL env var). Note: the SCIM 2.0 endpoints add ~18 additional provisioning endpoints (3 discovery + 5 per resource type × 3 resource types) for machine-to-machine management.
If you want to change the auth model (e.g., local JWT validation instead of per-request HTTP callback to /user/cache), fork middle_auth_client to do local token validation. This eliminates the HTTP round-trip on every request but requires updating the dependency in every CAVE service.
A central auth service handling multiple CAVE deployments (and non-CAVE services) needs to distinguish which dataset a request belongs to without endpoint conflicts.
Add the dataset and service type as a path prefix. The original /api/v1/... paths remain unchanged beneath it:
basedomain.org/fish2/cave/api/v1/user/cache
basedomain.org/fanc/cave/api/v1/user/cache
basedomain.org/other-service/custom/api/v1/...
Each CAVE deployment sets its AUTH_URL to include the prefix:
AUTH_URL=basedomain.org/fish2/cave
The auth service parses the prefix to determine dataset context. Non-CAVE services get their own prefixes with no conflict.
Advantages:
- Single domain, single TLS certificate
- Standard reverse proxy routing
AUTH_URLalready supports path prefixes — no code changes needed- Fork of
middle_auth_clientnot required if proxy strips the prefix before forwarding
Keep paths identical (/api/v1/...) and use separate AUTH_URL values per deployment. A reverse proxy routes based on prefix and injects the dataset context as a header:
AUTH_URL=basedomain.org/fish2 → proxy strips /fish2, adds X-Dataset: fish2 header
AUTH_URL=basedomain.org/fanc → proxy strips /fanc, adds X-Dataset: fanc header
The auth service receives unmodified /api/v1/... paths with dataset context in a header. No CAVE code changes or middle_auth_client fork required.
Encoding the dataset in the subdomain (fish2-cave.basedomain.org) works but requires wildcard DNS records, wildcard TLS certificates, and subdomain parsing logic. The path-based approaches are simpler operationally.
- Auth server: middle_auth
- Client library: middle_auth_client
- Token validation and caching:
middle_auth_client/decorators.py—user_cache_http(),@auth_required - Permission checking:
middle_auth_client/decorators.py—@auth_requires_permission,dataset_from_table_id() - PyChunkedGraph direct calls:
pychunkedgraph/app/app_utils.py—get_username_dict(),get_userinfo_dict() - CAVEclient endpoint definitions:
caveclient/endpoints.py—auth_endpoints_v1 - Environment variable configuration: Set via Kubernetes deployment config (see CAVEdeployment)