feat: implement DuckDB filesystem integration for Vortex file handling#6198
feat: implement DuckDB filesystem integration for Vortex file handling#6198iceboundrock wants to merge 10 commits intovortex-data:developfrom
Conversation
CodSpeed Performance ReportMerging this PR will not alter performanceComparing Summary
Footnotes
|
0ax1
left a comment
There was a problem hiding this comment.
Thanks for taking the time to look into this. There's a couple of issues around thread safety / locking, error handling and memory leaks.
If you used AI to assist in writing the code for your PR please mention this in your PR description (see: https://github.com/vortex-data/vortex/blob/develop/CONTRIBUTING.md).
| |_| {} | ||
| ); | ||
|
|
||
| #[derive(Clone, Copy)] |
There was a problem hiding this comment.
Why do we need a separate type (SendableClientContext) given that Send + Sync is set on ClientContext directly now?
There was a problem hiding this comment.
Because it separates ownership/lifetime semantics from thread-safe handle passing.
ClientContext is a wrapper! type with owned state and Drop logic. It is not Copy and should not be casually moved across async boundaries or spawned tasks.
On the other hand, SendableClientContext is a tiny, Copy wrapper around the raw duckdb_vx_client_context pointer with no destructor. It’s just a handle, which is ideal for passing into async or spawn_blocking code.
There was a problem hiding this comment.
Fair enough. Let's add a SAFETY comment on the wrapper stating that the same guarantees apply as for ClientContext.
| vortex_bail!("GCS glob expansion not yet implemented") | ||
| } | ||
| _ => { | ||
| let paths = glob(file_glob.as_ref()) |
There was a problem hiding this comment.
Shouldn't we also hook into the duckdb_fs_glob for local paths? Or asked differently, should we make any assumptions at all here about globbing details of duckdb_fs_glob but rather unconditionally delegate to duckdb? Whether or not gcs is supported is now a Duckdb implementation detail.
There was a problem hiding this comment.
My suggestion is to just do this let file_urls = duckdb_fs_glob(..).
0ax1
left a comment
There was a problem hiding this comment.
Almost there, thanks for taking the time to address the comments!
Co-authored-by: Alexander Droste <alexander.droste@protonmail.com> Signed-off-by: Ruoshi Li <iceboundrock@msn.com>
| fn pushdown_complex_filter( | ||
| bind_data: &mut Self::BindData, | ||
| expr: &duckdb::Expression, | ||
| expr: &crate::duckdb::Expression, |
| typedef struct { | ||
| const char **entries; | ||
| size_t count; | ||
| } duckdb_vx_string_list; |
There was a problem hiding this comment.
We only use this as a URI list, and this currently lives in filesystem.hpp. Let's rename this to duckdb_vx_uri_list to be explicit.
Uh oh!
There was an error while loading. Please reload this page.