Skip to content

feat: comprehensive observability instrumentation across Lance subsystems#6118

Draft
esteban wants to merge 14 commits intolance-format:mainfrom
esteban:feat/lance-io-observability-improvements
Draft

feat: comprehensive observability instrumentation across Lance subsystems#6118
esteban wants to merge 14 commits intolance-format:mainfrom
esteban:feat/lance-io-observability-improvements

Conversation

@esteban
Copy link
Contributor

@esteban esteban commented Mar 6, 2026

Motivation

Lance had fragmented observability: mixed log::/tracing:: usage, no structured in-memory stats for IO latency, cache hit rates, or scheduler behavior, and critical paths (write, commit retry, compaction, index build) were unobservable.

Solution

Three-layer instrumentation (IO, Encoding, Dataset) using a consistent pattern:

  • Atomic counters (Arc<AtomicU64> + Ordering::Relaxed, ~2ns amortized)
  • Typed stats structs with snapshot() for point-in-time reads
  • tracing events with target: fields for subscriber filtering
  • #[non_exhaustive] on new public structs for forward compatibility

No direct OpenTelemetry dependency — users bring their own tracing subscriber.

Crates modified

Crate Changes
lance-core CacheHitMiss, CacheStats, eviction tracking
lance-io IoStats (latency, errors), ScanStats (coalesce/split/backpressure), RetryStats, connection reset tracing, list() error tracking
lance-encoding Decode pipeline timing, buffer operation tracing
lance-index MetricsCollector extension, HNSW build progress, MetricsSnapshot
lance CompactionMetrics (bytes, timing), FragmentDistributionStats, commit tracing, write path tracing, MemWAL log:: -> tracing:: migration

New stats structs

Struct Access Key Fields
IoStats IOTracker::stats() read/write iops, bytes, latency_us, errors
ScanStats ScanScheduler::stats() iops, bytes_read, coalesced/split_iops, backpressure_warnings
CacheHitMiss LanceCache::cache_hit_miss() hits, misses, evictions, hit_ratio()
RetryStats CloudObjectReader::retry_stats() total_retries, total_failures
CompactionMetrics compact_files() return bytes_rewritten, bytes_binary_copied, elapsed_ms_sum
FragmentDistributionStats Dataset::fragment_distribution_stats() count, min/max/avg physical_rows
MetricsSnapshot LocalMetricsCollector::snapshot() comparisons, bytes, partitions probed, search duration

Backward compatibility

  • #[serde(default)] on new fields, #[serde(alias)] for renames
  • Default no-op impls on MetricsCollector trait extensions
  • #[non_exhaustive] on all new public structs

Test plan

  • Unit tests for RetryStats, CompactionMetrics AddAssign, IoTrackingStore list() error counting
  • Integration test for session cache hit/miss sync
  • Full test suites passing: lance-core, lance-io, lance-encoding, lance-index, lance

🤖 Generated with Claude Code

@github-actions github-actions bot added enhancement New feature or request python labels Mar 6, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Review: Comprehensive Observability Instrumentation

Overall this is a well-structured observability addition with consistent patterns (atomic counters, typed stats structs, tracing events). A few items to consider:

P1: HNSW progress counter never reports the final batch

In rust/lance-index/src/vector/hnsw/builder.rs, the progress logging inside for_each_init only fires when local_count >= LOG_INTERVAL (10,000). The residual vectors after the last full batch are never flushed to the shared progress counter or logged. For example, building with 10,005 vectors logs "10,000 / 10,005 done" but never reports completion. Consider adding a final flush after the par_iter completes, or logging a completion event.

P1: min_physical_rows is misleading when physical_rows is None

In Dataset::fragment_distribution_stats() (rust/lance/src/dataset.rs), fragments with physical_rows: None are treated as 0 rows via unwrap_or(0). This means min_physical_rows could report 0 not because a fragment is truly empty, but because its row count is unknown. Consider either skipping None fragments in the min/max computation, or documenting this behavior.

P1: Async latency measurement includes executor scheduling jitter

The Instant::now() / elapsed() pattern in IoTrackingStore (rust/lance-io/src/utils/tracking_store.rs) measures wall-clock time across .await points, which includes tokio task scheduling delays — not just IO time. Under load, reported latencies can be significantly inflated. This is acceptable for observability, but the doc comments on read_latency_us / write_latency_us in IoStats should note that these are end-to-end wall-clock latencies including async scheduling, not pure IO time.

Minor observations

  • HNSW loop starts at 1: The (1..len).into_par_iter() is correct — vector 0 is inserted before the loop at line ~789 (level_count[0].fetch_add). The progress counter initialized at 0 won't account for it, but this is cosmetic since it's just one vector.
  • Eviction listener only counts RemovalCause::Size: This is the right choice — it tracks capacity pressure evictions, not explicit invalidations.
  • #[non_exhaustive] + #[serde(default)]: Good forward-compatibility pattern throughout.
  • saturating_sub for coalesce/split metrics in scheduler: Correct, prevents underflow.

Looks good overall — the three items above are the main ones worth addressing before merge.

Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I like most of these changes. I've been meaning to migrate away from log to tracing.

In the future, it would be appreciated to keep PRs to be more focused when possible. It makes reviewing them easier. Plus it also can get individual features through review faster--less controversial changes might get held up by refinements to more controversial changes. Some of these could be their own PR:

  • Cache statistics
  • IO statistics
  • Compaction statistics - this one I'm skeptical of it's usefulness without knowing how it will be used.
  • Index search metrics

@esteban
Copy link
Contributor Author

esteban commented Mar 6, 2026

Thanks for the review @wjones127!

In the future, it would be appreciated to keep PRs to be more focused when possible. It makes reviewing them easier. Plus it also can get individual features through review faster--less controversial changes might get held up by refinements to more controversial changes. Some of these could be their own PR:...

I completely understand, I've been just piling this up in a local branch as I've been found the need to get better observability. Will try to provide mode atomic and focused PRs in the future!

esteban and others added 14 commits March 6, 2026 13:11
…s across all layers

Replace the (u64, u64) tuple return from hit_miss_counts() with a named
CacheHitMiss struct (hits, misses, hit_ratio(), miss_ratio()). Surface
sync cache stat accessors through Dataset and Python bindings so
consumers at every layer can read hit/miss counters without async overhead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add comprehensive cache observability: eviction counter via moka
eviction_listener, max_capacity_bytes field, utilization() method,
and tracing::debug on miss/clear/invalidate_prefix paths. Expose
full CacheStats (including evictions) through Python bindings via
new PyCacheStats pyclass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…onnection reset tracing

- RetryStats/RetrySnapshot on CloudObjectReader for retry/failure counters
- IO scheduler: coalesced_iops, split_iops, backpressure_events in ScanStats
- IoStats: per-method read/write latency_us and error counters
- Object writer: tracing::warn on connection reset retries
- Migrated log::warn/debug to tracing::warn/debug with structured targets

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- decode_batch: Instant-based elapsed_us + num_rows tracing at DEBUG level
- buffer.rs: tracing::debug for concat, zip_into_one, and misaligned-copy paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ld progress

- MetricsCollector: record_bytes_loaded, record_partitions_probed,
  record_candidates_evaluated, record_search_duration_us (default no-op)
- LocalMetricsCollector: atomic fields + dump_into forwarding for new metrics
- MetricsSnapshot struct for typed point-in-time reads
- HNSW builder: progress logging every 10k vectors, search strategy tracing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Stats

- CompactionMetrics: bytes_rewritten, bytes_binary_copied, elapsed_ms fields
  with #[serde(default)] for backward compatibility
- FragmentDistributionStats: count, total/min/max/avg physical_rows,
  fragments_with_deletions via Dataset::fragment_distribution_stats()
- Migrated compaction logging from log::info to tracing::info

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- docs/src/guide/observability.md: signal taxonomy, tracing targets,
  all stats structs, OTel integration guide, cardinality policy
- rust/examples/src/tracing_setup.rs: working example with env-filter,
  fragment stats, and compaction tracing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tructs, RetryStats tests

- commit_transaction: tracing::warn on conflict retry, non-conflict error,
  and retry exhaustion with structured fields (target: lance::commit)
- CacheHitMiss, CacheStats: add #[non_exhaustive] for API stability
- RetryStats: add unit tests for snapshot, increment, and equality
- Fix session test to avoid struct literal construction of non_exhaustive type

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…crate

- execute_with_retry: tracing::warn on conflict retry and exhaustion
  (target: lance::write::retry) with attempt, elapsed_ms fields
- insert.rs: migrate log::warn/info to tracing::warn/info with structured fields
- merge_insert.rs: migrate log::info to tracing::info

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- write.rs: use tracing::{debug, error, info, warn} (TaskDispatcher,
  RegionWriter, backpressure, shutdown lifecycle)
- manifest.rs: use tracing::{info, warn} (manifest store operations)
- memtable/flush.rs: use tracing::info (flush lifecycle)
- memtable/scanner/builder.rs: log::warn → tracing::warn (8 sites)
- index.rs: log::warn → tracing::warn (test utilities)
- docs: update observability guide with new write/commit targets

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- list()/list_with_offset(): only increment read_iops instead of
  recording fake reads with zero latency/bytes
- Rename backpressure_events → backpressure_warnings to reflect
  debounced warning count, not actual activation count
- Rename elapsed_ms → elapsed_ms_sum to clarify cumulative worker
  time across parallel compaction tasks; add serde alias for compat
- HNSW build: use thread-local counters flushed every 10k to avoid
  contended global atomic in inner parallel loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
list() and list_with_offset() streams are now wrapped with inspect()
to increment read_errors on Err items. Previously list-path failures
were invisible to IoStats, contradicting the documented "Failed read
operations" semantics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test_list_stream_errors_increment_read_errors: normal list, no errors
- test_list_error_items_counted_as_read_errors: ErrorInjectingStore mock
  exercises IoTrackingStore.list() end-to-end with 3 injected Err items
- test_list_with_offset_error_items_counted: list_with_offset IOPS count

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove buffer-level tracing (too low-level, overhead concern)
- Replace decode timing event with debug_span (reviewer suggestion)
- Remove FragmentDistributionStats (already available via public API)
- Add HNSW build completion log after par_iter (final batch unreported)
- Clarify latency docs include async scheduling jitter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@esteban esteban force-pushed the feat/lance-io-observability-improvements branch from 58f0493 to 773cdcc Compare March 6, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants