Skip to content

feat: add count_table_rows, insert_into_table, query_table for DirectoryNamespace#6112

Closed
XuQianJin-Stars wants to merge 1 commit intolance-format:mainfrom
XuQianJin-Stars:feature/dir-namespace-cos-ops
Closed

feat: add count_table_rows, insert_into_table, query_table for DirectoryNamespace#6112
XuQianJin-Stars wants to merge 1 commit intolance-format:mainfrom
XuQianJin-Stars:feature/dir-namespace-cos-ops

Conversation

@XuQianJin-Stars
Copy link
Contributor

close #6111

Add count_table_rows, insert_into_table, query_table for DirectoryNamespace

  • Enable lance-io 'tencent' feature in java/lance-jni/Cargo.toml for COS support
  • Implement count_table_rows with version checkout and predicate filter
  • Implement insert_into_table with append/overwrite modes via Arrow IPC
  • Implement query_table with filter, limit, offset, version support
  • Add 8 unit tests covering all new methods and edge cases

@github-actions github-actions bot added enhancement New feature or request java labels Mar 6, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Review

P0: query_table semantic mismatch with QueryTableRequest

The QueryTableRequest schema (per rest.yaml) has vector and k as required fields — it's designed for vector similarity search, not general-purpose table scanning. This implementation ignores the vector field entirely and repurposes k as a generic row limit for a plain scan. This will produce confusing behavior:

  • Callers must provide a vector (it's required in the schema) even though it's unused.
  • k means "number of nearest neighbors" in the API contract, not "LIMIT".
  • Filter-only queries without vector search are not what this endpoint is designed for.

If the goal is to support simple table scans, consider implementing a different method or at minimum handle the vector field appropriately (e.g., perform actual vector search when a vector is provided, or return an error if no vector is given and the method doesn't support plain scans).

P1: Unnecessary memory copy in insert_into_table

let cursor = Cursor::new(request_data.to_vec());

request_data.to_vec() copies the entire IPC payload. Bytes implements AsRef<[u8]>, so you can use Cursor::new(request_data.as_ref()) or Cursor::new(&request_data[..]) to avoid the allocation.

P1: query_table collects all scan results into memory then re-serializes

let batches: Vec<_> = scanner.try_into_stream().await...try_collect().await...;

For large tables this buffers everything in memory twice (once as RecordBatches, once as the IPC buffer). Consider streaming batches directly to the IPC writer instead of collecting first.

Minor: Unrelated feature flag change

Enabling the tencent feature in java/lance-jni/Cargo.toml is bundled into this PR but is unrelated to the DirectoryNamespace method implementations. Consider separating it into its own PR to keep changes focused.

@github-actions github-actions bot added the python label Mar 6, 2026
@codecov
Copy link

codecov bot commented Mar 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@XuQianJin-Stars XuQianJin-Stars force-pushed the feature/dir-namespace-cos-ops branch from f3ebdde to d5e0b1f Compare March 6, 2026 12:27
…oryNamespace

- Enable lance-io 'tencent' feature in java/lance-jni/Cargo.toml for COS support
- Implement count_table_rows with version checkout and predicate filter
- Implement insert_into_table with append/overwrite modes via Arrow IPC
- Implement query_table with filter, limit, offset, version support
- Add 8 unit tests covering all new methods and edge cases
@XuQianJin-Stars XuQianJin-Stars force-pushed the feature/dir-namespace-cos-ops branch from d12d794 to a54e229 Compare March 7, 2026 02:31
@Xuanwo
Copy link
Collaborator

Xuanwo commented Mar 9, 2026

Hi, thank you @XuQianJin-Stars for working on this. I noticed that this PR combines multiple different goals into one, which makes it difficult to review and cherry-pick. Could you split them into separate PRs instead?

@XuQianJin-Stars
Copy link
Contributor Author

Hi, thank you @XuQianJin-Stars for working on this. I noticed that this PR combines multiple different goals into one, which makes it difficult to review and cherry-pick. Could you split them into separate PRs instead?

separate PRs:
#6131
#6132

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add count_table_rows, insert_into_table, query_table for DirectoryNamespace

2 participants