Conversation
|
Would @win4r mind reviewing this PR when you get a chance? |
rwmjhb
left a comment
There was a problem hiding this comment.
Thanks for this contribution — cloud storage is a natural extension and the implementation is well-structured. The three-layer plumbing (schema → config parse → store) is clean and the lancedb.connect() signature usage is correct. A few items need attention:
Must fix
1. storageOptions values don't go through resolveEnvVars() (Critical — security)
Users will put AWS credentials (access_key_id, secret_access_key) in storageOptions. Currently parsePluginConfig() copies the raw object without calling resolveEnvVars() on each value, unlike apiKey and other sensitive fields. This forces users to hardcode secrets in their config file.
Fix: iterate over storageOptions entries and call resolveEnvVars() on each value before returning, so ${AWS_SECRET_ACCESS_KEY} works as expected.
2. CLI and migration don't receive storageOptions (High)
Several code paths call lancedb.connect() directly without storageOptions:
cli.ts— thereembedcommand opens the DB with a barelancedb.connect(path)src/migrate.ts—loadLegacyData()andcheckForLegacyData()do the same
These will silently fail against cloud buckets that require credentials. The "cloud storage support" story is incomplete without plumbing storageOptions through to these call sites (or at minimum exposing a CLI flag).
Should fix
3. uiHints missing sensitive: true (Medium)
The storageOptions uiHint should be marked "sensitive": true (like embedding.apiKey and retrieval.rerankApiKey), otherwise cloud credentials will show up in cleartext in OpenClaw's config UI and histories.
4. Cloud path detection is fragile (Medium)
dbPath.includes("://") can false-positive on Windows paths like C://lancedb or unusual directory names. Consider a stricter check like:
const isCloud = /^[a-z][a-z0-9+.-]*:\/\//i.test(dbPath.trim());This ensures only real URI schemes (s3://, gs://, az://) bypass local path validation.
5. No test coverage (Medium)
Please add at least:
- A test that cloud paths skip
validateStoragePath/api.resolvePath - A test that
storageOptionswith${ENV_VAR}placeholders get resolved correctly - A test that non-string values in
storageOptionsare rejected byparsePluginConfig
Nice to have
6. Empty storageOptions always passed (Low)
When unconfigured, { storageOptions: {} } is always passed to lancedb.connect(). LanceDB tolerates this, but you could conditionally omit it for cleanliness:
const connectOpts = Object.keys(this.storageOptions).length > 0
? { storageOptions: this.storageOptions }
: undefined;
db = await lancedb.connect(this.config.dbPath, connectOpts);Verdict
Fix #1 (env var substitution) and #2 (CLI/migration passthrough) are blockers. Once those plus the sensitive hint are addressed, this is good to merge.
|
Thanks @rwmjhb a lot for the detailed and thoughtful review! I really appreciate the feedback and have already addressed most of the points. I have a question regarding #2 (passing storageOptions to migration). From my understanding, the LegacyData in the migrate refers to data written by the memory-lance plugin. Currently, the memory-lance plugin does not support cloud storage yet.Given that, I’m wondering if it’s necessary to pass storageOptions to loadLegacyData() and checkForLegacyData() at this moment. Would it make sense to defer this change until the memory-lance plugin actually adds cloud storage support? We can track the migration compatibility in a separate issue or PR later, as there will likely be more details to handle — such as how to infer or pass the correct storageOptions for legacy data. Let me know what you think! |
|
@rwmjhb Gentle reminder for review, thanks! |
Change-Id: I809cd1d768ad818c6c831bed622d1ab42cd29dbc
Change-Id: If806c4459f840650ec52aaaf73064df35de399a0
0e137c5 to
d814969
Compare
|
I re-reviewed the current head. My updated take is:
My remaining concern is scope clarity: the cloud-storage support in this PR currently covers the primary store connection, but not the migration/source-db path. Concretely:
So if the intended scope of this PR is “allow the primary LanceDB memory store to use a cloud backend”, I think the implementation is largely there. Also, this repo’s default branch is currently One more question: what is the main use case for putting the primary LanceDB memory store on cloud storage? My suggestion:
If you go with (2), I don’t have a new blocker. |
|
Thank you @rwmjhb very much for re-reviewing and providing such constructive feedback. As you correctly pointed out, this PR only adds cloud storage support for the primary store at the moment, and does not cover the migration path. The reasons are: Supporting cloud storage for migration would require more detailed design decisions: for example, how to get storageOptions for the legacy store. If we support cross-storage migration (local to cloud, or cross-cloud), the source and target would need different storageOptions, the migration process is triggered during plugin registration, a stage where it’s not possible to retrieve the storageOptions of the legacy store. For these reasons, I prefer to implement storageOptions support in the migration path in a separate follow-up PR. I have rebased against the master branch. Please follow up on #142 , and I will close this PR. Thanks again for your help! |
Add cloud storage support for LanceDB
This PR adds support for connecting to cloud-based LanceDB instances by introducing a new storageOptions configuration parameter.
Changes
Usage
Users can now configure cloud storage connections by providing storage options in the plugin configuration:
This enables the plugin to work with LanceDB Cloud, S3-backed storage, and other remote storage backends without requiring local filesystem access.