Skip to content

feat(DirectoryNamespace): support index and transaction related operations#6196

Open
zhangyue19921010 wants to merge 3 commits intolance-format:mainfrom
zhangyue19921010:namespace-enhance-v1
Open

feat(DirectoryNamespace): support index and transaction related operations#6196
zhangyue19921010 wants to merge 3 commits intolance-format:mainfrom
zhangyue19921010:namespace-enhance-v1

Conversation

@zhangyue19921010
Copy link
Contributor

Index and transaction

  • create_table_index
  • list_table_indices
  • describe_table_index_stats
  • describe_transaction
  • create_table_scalar_index
  • drop_table_index

@github-actions github-actions bot added enhancement New feature or request python java labels Mar 13, 2026
@github-actions
Copy link
Contributor

PR Review: feat(DirectoryNamespace): support index and transaction related operations

P0: find_transaction is O(n) across all versions

When transaction_id is not parseable as a version number (i.e. it's a UUID), find_transaction iterates through every version in reverse, reading each transaction file from storage. For tables with many versions this will be extremely slow — each iteration is a separate I/O call.

for version in versions.into_iter().rev() {
    if let Some(transaction) = dataset
        .read_transaction_by_version(version.version)
        .await
        // ...

Consider adding an index/lookup structure or at minimum documenting this as a known limitation and logging a warning.

P1: Silent integer overflow in stats_field_as_i64

fn stats_field_as_i64(stats: &serde_json::Value, key: &str) -> Option<i64> {
    stats
        .get(key)
        .and_then(|value| value.as_i64().or_else(|| value.as_u64().map(|v| v as i64)))
}

v as i64 silently wraps if the u64 exceeds i64::MAX. Use i64::try_from(v).ok() instead.

P1: All vector index parameters are hardcoded

Every vector index type uses hardcoded build parameters (e.g. num_partitions=1, num_sub_vectors=8, max_iters=50). This means:

  • IVF with 1 partition is effectively a no-op partitioning
  • PQ with 8 sub-vectors and 1 num_bits may not suit the data
  • Users have no way to tune these through the namespace API

If CreateTableIndexRequest doesn't yet carry these fields, consider at least adding a TODO/issue reference. Otherwise, wire through the request fields.

P1: load_dataset validates version after loading the dataset

let dataset = builder.load().await.map_err(|e| { ... })?;  // loads first
if let Some(version) = version {
    if version < 0 { return Err(...); }                      // then validates

Move the negative-version check before builder.load() to avoid unnecessary I/O on invalid input.

Minor

  • paginate_indices mutates its input Vec in place (sort + drain). Consider returning a new Vec or renaming to make the mutation explicit (e.g. sort_and_truncate_indices).

Overall the code is well-structured with good error messages and solid test coverage for the happy paths. The load_dataset helper is a nice refactoring of the repeated DatasetBuilder pattern. Tests cover create/list/describe/drop for both scalar and vector indices.

@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 63.51931% with 255 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-namespace-impls/src/dir.rs 63.51% 222 Missing and 33 partials ⚠️

📢 Thoughts on this report? Let us know!

@zhangyue19921010
Copy link
Contributor Author

PR Review all addressed.

Self::paginate_indices(&mut indices, request.page_token, request.limit);
Ok(ListTableIndicesResponse {
indexes: indices,
page_token: None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it right to set the page_token to None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice Catch. Changed!

Copy link
Collaborator

@yanghua yanghua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants