Skip to content

feat(cli): add --progress flag for live indexing progress output#108

Open
halindrome wants to merge 10 commits intoDeusData:mainfrom
halindrome:feat/cli-progress-reporting
Open

feat(cli): add --progress flag for live indexing progress output#108
halindrome wants to merge 10 commits intoDeusData:mainfrom
halindrome:feat/cli-progress-reporting

Conversation

@halindrome
Copy link

@halindrome halindrome commented Mar 21, 2026

Summary

Adds a `--progress` flag to the CLI that shows live, human-readable progress on stderr while keeping the JSON result on stdout unchanged.

```
$ codebase-memory-mcp cli --progress index_repository '{"repo_path":"/path/to/repo"}'
Discovering files (6402 found)
Starting full index
[1/9] Building file structure
Extracting: 6300/6402 files (98%)
[2/9] Extracting definitions
[3/9] Building registry
[4/9] Resolving calls & edges
[5/9] Detecting tests
[6/9] Scanning HTTP links
[7/9] Analyzing git history
[8/9] Linking config files
[9/9] Writing database
Done: 71645 nodes, 106757 edges (9422 ms)
{"project":"...","status":"indexed","nodes":71645,"edges":106757}
```

Design

No pipeline changes. Implementation registers a custom log sink via `cbm_log_set_sink()` in `run_cli()` that maps existing structured log events (`pass.start`, `pass.timing`, `pipeline.done`, etc.) to human-readable phase labels. When a sink is registered it becomes the sole output handler — the default `fprintf(stderr, ...)` is suppressed, keeping stderr clean.

Key implementation details:

  • `src/cli/progress_sink.c` / `.h` — new log sink; maps 9 pipeline events to phase labels
  • `src/main.c` — detects `--progress` in argv, installs sink before `cbm_mem_init()` (so even `mem.init` is suppressed), wires SIGINT → `cbm_pipeline_cancel()` for clean Ctrl-C
  • `src/foundation/log.c` — when a sink is registered it replaces (not supplements) the default stderr output
  • Parallel extraction path shows in-place `\r` counter: `Extracting: N/M files (X%)`
  • Incremental no-op path exits cleanly after two lines
  • Node counts sourced from `gbuf.dump` event (fired before `node_by_qn` is freed) rather than `pipeline.done` (which always has `nodes=0` after the hash table free)

Test plan

  • `codebase-memory-mcp cli --progress index_repository '{"repo_path":"..."}'` — stderr shows phase labels, stdout has clean JSON
  • Large repo (>500 files, parallel path): `Extracting: N/M files (X%)` counter appears in-place
  • Incremental no-op: only "Discovering files" + "Starting incremental index" appear
  • Without `--progress`: structured `level=info` log lines still appear on stderr (unchanged)
  • 2046 tests pass
  • Ctrl-C during indexing: pipeline cancelled cleanly

🤖 Generated with Claude Code

shanemccarron-maker and others added 10 commits March 21, 2026 15:30
- Declare cbm_progress_sink_init(FILE*), cbm_progress_sink_fini(), cbm_progress_sink_fn()
- cbm_progress_sink_fn matches cbm_log_sink_fn callback signature
- Include guard CBM_PROGRESS_SINK_H, includes <stdio.h> only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…labels

- cbm_progress_sink_fn() parses msg= tag from structured log lines
- Maps pipeline.discover, pipeline.route, pass.start, pass.timing (9 passes),
  pipeline.done, parallel.extract.progress to human-readable stderr output
- parallel.extract.progress uses \r for in-place terminal updates
- Unknown tags pass through to previous sink (MCP UI routing preserved)
- cbm_progress_sink_init/fini save and restore previous sink

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cancel

- Scan argv for --progress before tool dispatch; strip it and shift args
- Add g_cli_pipeline global and cli_sigint_handler (calls cbm_pipeline_cancel)
- When --progress: call cbm_progress_sink_init(stderr) and register SIGINT handler
- For index_repository + --progress: bypass cbm_mcp_handle_tool, call
  cbm_pipeline_new/cbm_pipeline_run directly, set g_cli_pipeline before run
- Assemble JSON result (project/status/nodes/edges) via snprintf, print to stdout
- After run, call cbm_progress_sink_fini(); all progress output goes to stderr

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CLI_SRCS now includes src/cli/progress_sink.c alongside src/cli/cli.c
- Build verified clean: build/c/codebase-memory-mcp produced with no warnings
- All 2042 tests pass with no regressions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add test_cli_progress_stderr_labels: injects pipeline.discover log event,
  asserts progress sink writes "Discovering" to target FILE*
- Add test_cli_progress_stdout_json: injects pass.start + pipeline.done events,
  asserts "[1/9]" phase label and "Done:" appear; confirms output is not JSON
- Include <cli/progress_sink.h> and <foundation/log.h> headers
- Register both tests in SUITE(cli) under group G

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Normalize alignment whitespace and line-continuation style per project
clang-format configuration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a custom log sink is registered via cbm_log_set_sink(), suppress the
default fprintf(stderr, ...) output in both cbm_log() and cbm_log_int().
The sink is now the sole output handler rather than an additive listener.

Also pre-scan for --progress in main() before cbm_mem_init() so the sink is
installed before mem.init fires, keeping stderr completely clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- volatile on g_cli_pipeline so signal handler always observes the pointer
- volatile on s_needs_newline to prevent stale-read between worker/main threads
- Fix incorrect PIPE_BUF thread-safety comment (correct reason: per-FILE* locking)
- Add comment documenting --progress silent-ignore for non-index_repository tools
- Rename test_cli_progress_stdout_json → test_cli_progress_phase_labels
- Add test_cli_progress_parallel_extract: exercises \r path + pass.timing flush
- Add test_cli_progress_unknown_tag: verifies unknown events are silently dropped

2046 tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…utput

Two display bugs found during manual testing against a large repo:

1. Phases 6 and 7 were swapped: HTTP links fires before git history in the
   actual pipeline execution order. Swap their phase numbers in the sink.

2. "Done: 0 nodes" was shown because cbm_gbuf_dump_to_sqlite() frees
   node_by_qn before pipeline.done is logged, making cbm_gbuf_node_count()
   return 0. Fix: capture node/edge counts from the gbuf.dump event (which
   fires with the real counts before the hash table is freed) and use them
   for the Done: display line.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the pre-scan cbm_progress_sink_init(stderr) in main() with a
temporary log level raise to WARN around cbm_mem_init(). This suppresses
the mem.init log line without installing the sink twice — run_cli()
remains the sole owner of the progress sink lifecycle.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@DeusData DeusData added the enhancement New feature or request label Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants