Skip to content

feat: add cloud storage support#75

Closed
ddupg wants to merge 2 commits intoCortexReach:mainfrom
ddupg:feat-storage-options
Closed

feat: add cloud storage support#75
ddupg wants to merge 2 commits intoCortexReach:mainfrom
ddupg:feat-storage-options

Conversation

@ddupg
Copy link

@ddupg ddupg commented Mar 7, 2026

Add cloud storage support for LanceDB

This PR adds support for connecting to cloud-based LanceDB instances by introducing a new storageOptions configuration parameter.

Changes

  • Plugin configuration: Added storageOptions field to accept key-value string pairs for LanceDB connection options
  • Cloud URL detection: Automatically detects cloud storage URLs (containing ://) and skips local filesystem validation
  • LanceDB connection: Passes storageOptions to the lancedb.connect() call for cloud authentication and configuration
  • Schema updates: Updated plugin JSON schema and UI labels for the new configuration option

Usage

Users can now configure cloud storage connections by providing storage options in the plugin configuration:

{                                                                                                                                                                                                           
  dbPath: s3://my-bucket/path,                                                                                                                                                                              
  storageOptions: {
    access_key_id: your-key,
    secret_access_key: your-secret,
    aws_endpoint: bucket endpoint,
  }                                                                                                                                                                                                         
}                                                                                                                                                                                                           

This enables the plugin to work with LanceDB Cloud, S3-backed storage, and other remote storage backends without requiring local filesystem access.

@ddupg
Copy link
Author

ddupg commented Mar 7, 2026

Would @win4r mind reviewing this PR when you get a chance?

Copy link
Collaborator

@rwmjhb rwmjhb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution — cloud storage is a natural extension and the implementation is well-structured. The three-layer plumbing (schema → config parse → store) is clean and the lancedb.connect() signature usage is correct. A few items need attention:

Must fix

1. storageOptions values don't go through resolveEnvVars() (Critical — security)

Users will put AWS credentials (access_key_id, secret_access_key) in storageOptions. Currently parsePluginConfig() copies the raw object without calling resolveEnvVars() on each value, unlike apiKey and other sensitive fields. This forces users to hardcode secrets in their config file.

Fix: iterate over storageOptions entries and call resolveEnvVars() on each value before returning, so ${AWS_SECRET_ACCESS_KEY} works as expected.

2. CLI and migration don't receive storageOptions (High)

Several code paths call lancedb.connect() directly without storageOptions:

  • cli.ts — the reembed command opens the DB with a bare lancedb.connect(path)
  • src/migrate.tsloadLegacyData() and checkForLegacyData() do the same

These will silently fail against cloud buckets that require credentials. The "cloud storage support" story is incomplete without plumbing storageOptions through to these call sites (or at minimum exposing a CLI flag).

Should fix

3. uiHints missing sensitive: true (Medium)

The storageOptions uiHint should be marked "sensitive": true (like embedding.apiKey and retrieval.rerankApiKey), otherwise cloud credentials will show up in cleartext in OpenClaw's config UI and histories.

4. Cloud path detection is fragile (Medium)

dbPath.includes("://") can false-positive on Windows paths like C://lancedb or unusual directory names. Consider a stricter check like:

const isCloud = /^[a-z][a-z0-9+.-]*:\/\//i.test(dbPath.trim());

This ensures only real URI schemes (s3://, gs://, az://) bypass local path validation.

5. No test coverage (Medium)

Please add at least:

  • A test that cloud paths skip validateStoragePath / api.resolvePath
  • A test that storageOptions with ${ENV_VAR} placeholders get resolved correctly
  • A test that non-string values in storageOptions are rejected by parsePluginConfig

Nice to have

6. Empty storageOptions always passed (Low)

When unconfigured, { storageOptions: {} } is always passed to lancedb.connect(). LanceDB tolerates this, but you could conditionally omit it for cleanliness:

const connectOpts = Object.keys(this.storageOptions).length > 0
  ? { storageOptions: this.storageOptions }
  : undefined;
db = await lancedb.connect(this.config.dbPath, connectOpts);

Verdict

Fix #1 (env var substitution) and #2 (CLI/migration passthrough) are blockers. Once those plus the sensitive hint are addressed, this is good to merge.

@ddupg
Copy link
Author

ddupg commented Mar 8, 2026

Thanks @rwmjhb a lot for the detailed and thoughtful review! I really appreciate the feedback and have already addressed most of the points.

I have a question regarding #2 (passing storageOptions to migration). From my understanding, the LegacyData in the migrate refers to data written by the memory-lance plugin. Currently, the memory-lance plugin does not support cloud storage yet.Given that, I’m wondering if it’s necessary to pass storageOptions to loadLegacyData() and checkForLegacyData() at this moment.

Would it make sense to defer this change until the memory-lance plugin actually adds cloud storage support? We can track the migration compatibility in a separate issue or PR later, as there will likely be more details to handle — such as how to infer or pass the correct storageOptions for legacy data.

Let me know what you think!

@ddupg ddupg requested a review from rwmjhb March 8, 2026 06:07
@ddupg
Copy link
Author

ddupg commented Mar 9, 2026

@rwmjhb Gentle reminder for review, thanks!

ddupg added 2 commits March 10, 2026 13:11
Change-Id: I809cd1d768ad818c6c831bed622d1ab42cd29dbc
Change-Id: If806c4459f840650ec52aaaf73064df35de399a0
@ddupg ddupg force-pushed the feat-storage-options branch from 0e137c5 to d814969 Compare March 10, 2026 05:13
@rwmjhb
Copy link
Collaborator

rwmjhb commented Mar 10, 2026

I re-reviewed the current head. My updated take is:

  • storageOptions env var resolution, the sensitive UI hint, and the primary store cloud connection path all look good now.
  • CI is passing, so no concerns there.

My remaining concern is scope clarity: the cloud-storage support in this PR currently covers the primary store connection, but not the migration/source-db path.

Concretely:

  • src/migrate.ts still scans only local directories in findSourceDatabase()
  • loadLegacyData() still does a bare lancedb.connect(sourceDbPath) without storageOptions

So if the intended scope of this PR is “allow the primary LanceDB memory store to use a cloud backend”, I think the implementation is largely there.
But if the intended scope is broader “cloud storage support” in general, then the PR description is currently stronger than the actual coverage, because migration/cloud source-db is not implemented yet.

Also, this repo’s default branch is currently master, not main, so please retarget the PR base branch to master and re-sync with the latest master.

One more question: what is the main use case for putting the primary LanceDB memory store on cloud storage?
For example, is the goal cross-machine/shared memory, persistence in ephemeral environments, centralized backup, or multi-agent access to the same memory backend?
That context would help clarify whether migration/cloud-source support should be considered in-scope for this PR or left as follow-up work.

My suggestion:

  1. Retarget the PR to master
  2. If migration/cloud source support is out of scope, narrow the PR description to make that explicit
  3. If you want to keep the current broader description, then the migration/source-db path still needs to be completed

If you go with (2), I don’t have a new blocker.

@ddupg
Copy link
Author

ddupg commented Mar 10, 2026

Thank you @rwmjhb very much for re-reviewing and providing such constructive feedback.

As you correctly pointed out, this PR only adds cloud storage support for the primary store at the moment, and does not cover the migration path. The reasons are:
I understand migration/source-db refers to data written by the memory-lancedb plugin. If I’m wrong, please correct me. Since memory-lancedb itself does not yet support cloud storage, implementing cloud support for the source-db migration is not urgent right now.

Supporting cloud storage for migration would require more detailed design decisions: for example, how to get storageOptions for the legacy store. If we support cross-storage migration (local to cloud, or cross-cloud), the source and target would need different storageOptions, the migration process is triggered during plugin registration, a stage where it’s not possible to retrieve the storageOptions of the legacy store.

For these reasons, I prefer to implement storageOptions support in the migration path in a separate follow-up PR.

I have rebased against the master branch. Please follow up on #142 , and I will close this PR.

Thanks again for your help!

@ddupg ddupg closed this Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants