Post-refactor cleanup: security fixes, skill restructure, stale reference removal#28
Conversation
Security (3 medium-severity): - Block find -exec/-execdir/-delete/-ok in ExecTool (injection risk) - Add "--" before grep pattern to prevent flag injection - Replace ad-hoc path validation in MCP handlers with safepath.SafeJoin - Use safepath.SafeJoin in policy builtins resolvePath (traversal risk) - Change crash log permissions from 0644 to 0600 Code quality: - Add rows.Err() checks after all 8 rows.Next() loops in codeintel/repository.go - Replace err == io.EOF with errors.Is(err, io.EOF) in 3 files - Fix out.Close() error handling in memory_export.go (named return) Freshness review fixes (Greptile PR #26 feedback): - Fix decay discontinuity: unified formula (1.0 - ratio * 0.8) replaces hardcoded 0.2 all-missing + 0.4 formula partial-missing - Add cache eviction (purge entries >2x TTL when cache exceeds 1000) - Convert global cache to struct-based statCache (testable, bounded) - Add TODO(freshness-level2) comments for dead code paths - Mark freshness plan doc as "implemented with deviations" - Add TestDecaySmoothCurve test verifying no discontinuities
- Updated migration tests to clarify directory structure and improve comments for better understanding. - Enhanced error handling in the TUI by treating agent completion warnings as non-fatal, allowing for smoother user experience. - Improved list view rendering by adding workspace checks and refactoring header rendering for better clarity. - Introduced new skills for TaskWing, including detailed SKILL.md files for various commands to enhance project knowledge and workflow. - Added functionality to simplify code while preserving behavior, ensuring that optimization does not bypass necessary gates. - Implemented status command to provide current task progress and acceptance criteria, improving user awareness of task status.
- Remove cmd/plan.go, cmd/goal.go, and plan TUI (1,982 lines deleted) - MCP plan tool (clarify/generate/decompose/expand/finalize/audit) unchanged - Planning prompt now scales task count dynamically to goal complexity - Tasks include self-contained context (files, patterns, constraints, tech stack) - Task enricher always provides baseline project context (constraints + decisions) - Fix stop hook reliability: sync TasksCompleted from DB on each invocation - Add session save retry logic to prevent stale state from failed writes - Clear stale CurrentTaskID when DB lookup fails
- policy/builtins.go: use safepath.ValidateAbsPath for absolute paths, fixes prefix collision bypass (e.g. /project-evil matching /project) - freshness/freshness.go: add evictOldest fallback when all cache entries are fresh but cache exceeds max size during burst scenarios - mcp/handlers.go: restore absolute path support in validateAndResolvePath via safepath.ValidateAbsPath, fixing regression for MCP clients - agents/impl/analysis_deps.go: use filepath.Join instead of string concat - bootstrap/factory.go: use safepath.SafeJoin for dependency file checks
…improvements - Unified GetProjectContext replaces 3 separate retrieval paths (ContextRetriever, TaskEnricher, buildTaskContext) - Bootstrap: fix duplicate "Bootstrap complete!" in multi-repo, batch link warnings into single summary, clean verification output - Bootstrap: per-service project context in workspace mode, progress callbacks, repo count accuracy - Models: add gpt-5.4/5.4-mini/5.4-nano, claude-opus-4-6/sonnet-4-6, gemini-3.1, remove deprecated models - Models: newest-first ordering in selection UI - UI: grouped knowledge output, freshness indicators, consistent header boxes across all commands - UI: markdown rendering in ask --answer output, adaptive width - Skills: add /taskwing:context, remove /taskwing:debug and /taskwing:simplify, fix stale file pruning - Skills: clarify questions presented to user (not auto-answered), auto-start first task after plan creation - Hooks: stop hook syncs TasksCompleted from DB, session save retry, verbose logs gated behind --debug - MCP: behavioral CLAUDE.md instructions tell AI tools when to use TaskWing - Prompts: generic and directive (no hardcoded tech stack examples), dynamic task count scaling - Fix: read_file tool returns helpful message for directories instead of crashing ReAct agent - Fix: auto-regenerate slash commands on MCP server start after brew upgrade - Fix: truncateString consolidated to utils.Truncate, nil guards in context.go, brief.go pluralization - Remove: CLI plan/goal/slash commands, docs/_partials, sync-docs scripts, stale doc references
Update CLI output, hints, comments, and docs to reflect the 8->4 skill restructure. Replace references to /taskwing:ask, /taskwing:status, /taskwing:explain, and /taskwing:remember with their current equivalents (MCP tools or /taskwing:context).
| // hasAnyDependencyFile checks if at least one common dependency manifest exists. | ||
| func hasAnyDependencyFile(basePath string) bool { | ||
| for _, name := range commonDepFiles { | ||
| if _, err := os.Stat(filepath.Join(basePath, name)); err == nil { |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
In general, to fix uncontrolled path usage, ensure that any filesystem operation using a path influenced by user input first validates or constrains that path to a known-safe directory (or otherwise normalizes and checks it). In this codebase, safepath.ValidateAbsPath is already the standard mechanism for doing that. The best approach here is to validate basePath inside hasAnyDependencyFile before using it with filepath.Join and os.Stat. That makes the dependency-agent logic robust even if input.BasePath is ever used from another call-site that does not perform validation, and also signals to static analysis that the path has been sanitized.
Concretely, in internal/agents/impl/analysis_deps.go, update hasAnyDependencyFile so that it first calls safepath.ValidateAbsPath with a trusted root (the process working directory is a reasonable default here) and the provided basePath. If validation fails, the function should return false (no dependency files found) rather than attempting any filesystem access. If validation succeeds, the loop should use the validated path when joining with name. The package already imports internal/safepath, so no new imports are required. No changes are needed in internal/server/handlers.go or internal/bootstrap/runner.go, since they already pass through sanitized values; the fix is fully localized to analysis_deps.go and does not alter functionality for legitimate inputs.
| @@ -278,8 +278,19 @@ | ||
|
|
||
| // hasAnyDependencyFile checks if at least one common dependency manifest exists. | ||
| func hasAnyDependencyFile(basePath string) bool { | ||
| // Ensure the basePath is a validated absolute path rooted at the current working directory. | ||
| // This is a defensive check in case hasAnyDependencyFile is called with an unvalidated path. | ||
| cwd, err := os.Getwd() | ||
| if err != nil { | ||
| return false | ||
| } | ||
| validBasePath, err := safepath.ValidateAbsPath(cwd, basePath) | ||
| if err != nil { | ||
| return false | ||
| } | ||
|
|
||
| for _, name := range commonDepFiles { | ||
| if _, err := os.Stat(filepath.Join(basePath, name)); err == nil { | ||
| if _, err := os.Stat(filepath.Join(validBasePath, name)); err == nil { | ||
| return true | ||
| } | ||
| } |
| if err != nil { | ||
| continue | ||
| } | ||
| if _, err := os.Stat(p); err == nil { |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Copilot Autofix
AI 4 days ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
| } | ||
| } | ||
| // Also check for dependency files in subdirectories (monorepo) | ||
| if entries, err := os.ReadDir(basePath); err == nil { |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
In general, to fix this kind of issue you must avoid using user-controlled path strings directly in file-system APIs. Either: (a) validate and normalize the path and ensure it stays within a safe root, or (b) when you need to construct subpaths, use a safe-join routine that enforces that constraint.
In this specific case, hasDependencyFiles already uses safepath.SafeJoin(basePath, ...) when forming paths to check with os.Stat, but falls back to os.ReadDir(basePath) when enumerating subdirectories. To align with the existing safety approach and remove the tainted sink, we can rework the monorepo scan to avoid calling os.ReadDir on basePath directly. Instead, we can:
- Use
filepath.WalkDirstarting atbasePathto traverse directories; this call uses the samebasePath, but only once, and we immediately sanitize any derived paths before using them withos.Stat. - For each directory visited up to one level deep (to preserve the current “project root + immediate subdirectories” behavior), we:
- Join the directory path and each manifest name via
safepath.SafeJoin. os.Statthat safe-joined path as before.
- Join the directory path and each manifest name via
This change removes the direct os.ReadDir(basePath) sink; every per-file check continues to use SafeJoin, and traversal is constrained to a single WalkDir call at the already-validated basePath. Functionality remains effectively the same: the root is checked, and then each immediate subdirectory is checked for any of the dependency manifest files.
Concretely:
- In
internal/bootstrap/factory.go, extend the imports to include"io/fs". - Replace the
os.ReadDir(basePath)block (lines 72–88) inhasDependencyFileswith afilepath.WalkDiroverbasePaththat:- Treats
basePathas depth 0, its direct children as depth 1, and skips deeper directories. - For each directory at depth 0 or 1, attempts
safepath.SafeJoinwith each manifest name and usesos.Statto detect existence. - Returns early (by capturing a local variable and using an
errsentinel) when a manifest is found.
- Treats
This keeps all path construction for stats under the protection of SafeJoin and avoids the specific flagged pattern.
| if err != nil { | ||
| continue | ||
| } | ||
| if _, err := os.Stat(p); err == nil { |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
Copilot Autofix
AI 4 days ago
Copilot could not generate an autofix suggestion
Copilot could not generate an autofix suggestion for this alert. Try pushing a new commit or if the problem persists contact support.
Replace removed taskwing goal references, rewrite CHANGELOG unreleased section to reflect consolidated slash commands and MCP-first planning.
- Save task ID before clearing so debug log prints actual value - Use safepath.SafeJoin for tw-* directory cleanup to prevent traversal - Log wave 1 agent errors instead of silently discarding them - Remove dead legacy marker check (duplicate of primary path) - Fix plan_id -> clarify_session_id reference in plan skill
| sb.WriteString("## Relevant Context\n") | ||
| for _, node := range pc.RelevantNodes { | ||
| if node.Node == nil { | ||
| continue |
There was a problem hiding this comment.
GetProjectContext always returns a nil error
The function signature declares (*ProjectContext, error), but every internal error (DB failure, search failure) is only logged to SearchLog — the error return is always nil. This breaks the standard Go error contract: callers who check if err != nil can never distinguish a fully successful retrieval from a complete database failure, silently receiving an empty ProjectContext instead.
For example, in cmd/hook.go:
pc, err := knowledge.GetProjectContext(ctx, ks, opts)
if err != nil {
return task.FormatRichContext(ctx, nextTask, plan, nil) // fallback
}This fallback will never fire even if svc.ListNodesByType returns a driver error or the SQLite file is corrupted.
Either propagate the first critical error encountered (e.g., when IncludeConstraints is true and the DB call fails), or document clearly with a comment that this function never errors and callers should inspect SearchLog instead.
Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/knowledge/context.go
Line: 172-175
Comment:
**`GetProjectContext` always returns a nil error**
The function signature declares `(*ProjectContext, error)`, but every internal error (DB failure, search failure) is only logged to `SearchLog` — the error return is always `nil`. This breaks the standard Go error contract: callers who check `if err != nil` can never distinguish a fully successful retrieval from a complete database failure, silently receiving an empty `ProjectContext` instead.
For example, in `cmd/hook.go`:
```go
pc, err := knowledge.GetProjectContext(ctx, ks, opts)
if err != nil {
return task.FormatRichContext(ctx, nextTask, plan, nil) // fallback
}
```
This fallback will never fire even if `svc.ListNodesByType` returns a driver error or the SQLite file is corrupted.
Either propagate the first critical error encountered (e.g., when `IncludeConstraints` is true and the DB call fails), or document clearly with a comment that this function never errors and callers should inspect `SearchLog` instead.
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| info, err := os.Stat(path) | ||
|
|
||
| cacheMu.Lock() | ||
| cache[path] = cacheEntry{info: info, err: err, checkedAt: time.Now()} | ||
| cacheMu.Unlock() | ||
| c.mu.Lock() | ||
| c.entries[path] = cacheEntry{info: info, err: err, checkedAt: time.Now()} | ||
| // Evict when cache grows too large | ||
| if len(c.entries) > cacheMaxSize { | ||
| c.evictExpired() | ||
| // Fallback: if still over limit (burst scenario), evict oldest entries | ||
| if len(c.entries) > cacheMaxSize { | ||
| c.evictOldest(len(c.entries) - cacheMaxSize) | ||
| } | ||
| } | ||
| c.mu.Unlock() | ||
|
|
||
| return info, err | ||
| } | ||
|
|
||
| // evictExpired removes entries older than 2x TTL. Caller must hold write lock. | ||
| func (c *statCache) evictExpired() { |
There was a problem hiding this comment.
TOCTOU race in
stat can overwrite a freshly-written cache entry
There is a classic check-then-act race between releasing the read lock and acquiring the write lock. If goroutine A reads a stale entry, drops the RLock, and goroutine B concurrently completes os.Stat and writes a fresh entry under the write lock first, goroutine A will then acquire the write lock and overwrite B's up-to-date entry with A's (potentially older) stat result. On high-frequency watch-mode paths this can produce a spurious stale annotation that persists for up to cacheTTL.
The standard remedy is to re-check the cache under the write lock before writing:
info, err := os.Stat(path)
c.mu.Lock()
// Re-check: another goroutine may have refreshed this entry while we were statting
if existing, ok := c.entries[path]; ok && time.Since(existing.checkedAt) < cacheTTL {
c.mu.Unlock()
return existing.info, existing.err
}
c.entries[path] = cacheEntry{info: info, err: err, checkedAt: time.Now()}
// ... eviction logic
c.mu.Unlock()
return info, errPrompt To Fix With AI
This is a comment left during a code review.
Path: internal/freshness/freshness.go
Line: 213-232
Comment:
**TOCTOU race in `stat` can overwrite a freshly-written cache entry**
There is a classic check-then-act race between releasing the read lock and acquiring the write lock. If goroutine A reads a stale entry, drops the `RLock`, and goroutine B concurrently completes `os.Stat` and writes a fresh entry under the write lock first, goroutine A will then acquire the write lock and overwrite B's up-to-date entry with A's (potentially older) stat result. On high-frequency watch-mode paths this can produce a spurious stale annotation that persists for up to `cacheTTL`.
The standard remedy is to re-check the cache under the write lock before writing:
```go
info, err := os.Stat(path)
c.mu.Lock()
// Re-check: another goroutine may have refreshed this entry while we were statting
if existing, ok := c.entries[path]; ok && time.Since(existing.checkedAt) < cacheTTL {
c.mu.Unlock()
return existing.info, existing.err
}
c.entries[path] = cacheEntry{info: info, err: err, checkedAt: time.Now()}
// ... eviction logic
c.mu.Unlock()
return info, err
```
How can I resolve this? If you propose a fix, please make it concise.| p, err := safepath.SafeJoin(basePath, name) | ||
| if err != nil { | ||
| continue | ||
| } |
There was a problem hiding this comment.
Hardcoded forward-slash separator breaks portability
The inner subdirectory join uses a hardcoded "/" literal instead of filepath.Join:
p, err := safepath.SafeJoin(basePath, e.Name()+"/"+name)On Windows, e.Name()+"/"+name produces subdir/go.mod with a Unix separator. Depending on how safepath.SafeJoin resolves this internally, the resulting path may not match what os.Stat expects, causing hasDependencyFiles to always return false for monorepo sub-manifests on Windows.
| } | |
| p, err := safepath.SafeJoin(basePath, filepath.Join(e.Name(), name)) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: internal/bootstrap/factory.go
Line: 66
Comment:
**Hardcoded forward-slash separator breaks portability**
The inner subdirectory join uses a hardcoded `"/"` literal instead of `filepath.Join`:
```go
p, err := safepath.SafeJoin(basePath, e.Name()+"/"+name)
```
On Windows, `e.Name()+"/"+name` produces `subdir/go.mod` with a Unix separator. Depending on how `safepath.SafeJoin` resolves this internally, the resulting path may not match what `os.Stat` expects, causing `hasDependencyFiles` to always return false for monorepo sub-manifests on Windows.
```suggestion
p, err := safepath.SafeJoin(basePath, filepath.Join(e.Name(), name))
```
How can I resolve this? If you propose a fix, please make it concise.- Propagate critical error from GetProjectContext when constraint DB fetch fails - Resolve ARCHITECTURE.md from basePath directly instead of raw .. traversal - Add TOCTOU re-check in stat cache to prevent stale overwrites under concurrency - Use filepath.Join instead of hardcoded "/" for monorepo subdirectory paths - Replace legacy log.Printf with slog.Debug in doc analysis agent
Summary
Greptile Summary
This is a substantial cleanup PR (89 files, net −1,235 lines) that consolidates the skill/command architecture, hardens path security, and improves bootstrap reliability. The changes are well-motivated and the overall direction is correct — eliminating the shell-out pattern for slash commands in favour of embedded SKILL.md content is a clear quality improvement, and the two-wave bootstrap execution model is a meaningful reliability upgrade.
Key changes and their status:
safepaththroughoutfactory.go,handlers.go,initializer.go): solid. One gap —hasDependencyFilesinfactory.gouses a hardcoded/separator for subdirectory manifest paths instead offilepath.Join, which will produce wrong paths on Windows.knowledge/context.go): theGetProjectContextfunction is a good centralisation, but itserrorreturn value is alwaysnil— all failures are silently swallowed intoSearchLog. Callers'if err != nilfallback branches are permanently dead code, masking DB failures.freshness.go): bounded size + TTL eviction is a genuine improvement. However, there is a TOCTOU race: after releasing theRLock, a concurrent goroutine can refresh the entry before the stale-reader acquires theLock, resulting in the fresh entry being overwritten with an older stat result.skills.List()still returns whatever directories exist under the embed glob, so any leftover deprecated directories would silently reappear in the list.react_bootstrap.go): clean implementation with properErrNoToolCallingfallback sentinel. One inconsistency:analysis_doc.gologs via legacylog.Printfwhile every other agent path useslog/slog.llm/models.go): updated with newest-first ordering.GetCategoryBadgewas removed — confirm no callers outside the diff use it (the UI rendering path should be audited).cmd/hook.go: the previous review comment aboutstaleIDlogging was addressed cleanly.Confidence Score: 3/5
GetProjectContextnil-error contract silently kills error-handling fallback paths that callers have coded; (2) the TOCTOU instatCache.statcan overwrite a fresh entry with a stale result under concurrent load; (3) the hardcoded/separator inhasDependencyFilesbreaks Windows. These are not catastrophic but are correctness issues introduced by this PR rather than pre-existing. The remainder of the PR is clean, well-structured, and a genuine improvement.internal/knowledge/context.go(nil error contract),internal/freshness/freshness.go(TOCTOU race),internal/bootstrap/factory.go(path separator portability)Important Files Changed
statCachestruct with TTL eviction and LRU fallback. Good structural improvement, but introduces a TOCTOU race between the RLock release and the subsequent write-lock acquisition that can overwrite a concurrently-refreshed cache entry.GetProjectContextfunction. Two issues: (1) theerrorreturn is always nil — all failures are silently logged toSearchLogmaking the error contract misleading; (2)archPathuses a rawfilepath.Joinwith..bypassing the safepath abstraction introduced elsewhere in this PR.hasDependencyFilesusese.Name()+"/"+name(hardcoded/) instead offilepath.Join, which will produce incorrect paths on Windows.slog.Debugrather than silently dropped. Clean implementation with appropriate context budget truncation.!taskwing slash) to embedded SKILL.md content. AddsdeprecatedSlashCommandspruning list for upgrade cleanup. SafeJoin used for tw-* directory removal. ThecreateClaudeSkillspath is now the canonical path for Claude Code, while the generic path also embeds content — duplication is minor but intentional.decisionsfinding type extracted from doc analysis. Uses legacylog.Printfinconsistently — the rest of the agent package usesslog. Otherwise logic is clean with appropriate minimum-findings threshold guarding.GetCategoryBadgeremoved — verify no callers remain outside the diff.Get,GetBody, andListbacked by//go:embed. TheList()function returns all directories matchingtaskwing-*/SKILL.md— if any deprecated skill directories remain in the tree, they will appear in this list even though they are no longer generated as commands.staleIDbefore clearing), syncsTasksCompletedfrom DB (source of truth instead of session cache), and adds a one-retry save path for hook session writes. Clean improvements throughout.defer Close()),ErrNoToolCallingsentinel for graceful fallback, and configurablemaxStepsto prevent run-away tool loops.Sequence Diagram
sequenceDiagram participant CLI as CLI / MCP Hook participant Runner as bootstrap.Runner participant W1 as Wave 1<br/>(doc + deps) participant W2 as Wave 2<br/>(code + git) participant KCtx as knowledge.GetProjectContext participant Cache as freshness.statCache participant DB as SQLite / Memory DB CLI->>Runner: RunWithOptions(ctx, path, opts) Runner->>W1: runParallel(doc, deps) W1-->>Runner: wave1Results + wave1Err (logged, non-fatal) Runner->>Runner: buildWaveContext(wave1Results) Runner->>W2: runParallel(code, git) [+ExistingContext=wave1_summary] W2-->>Runner: wave2Results Runner-->>CLI: append(wave1Results, wave2Results) CLI->>KCtx: GetProjectContext(ctx, svc, opts) KCtx->>DB: ListNodesByType(constraint) DB-->>KCtx: constraints (errors → SearchLog only) KCtx->>Cache: stat(evidencePath) per node Cache-->>KCtx: FileInfo (TTL-checked, bounded eviction) KCtx->>DB: Search(query, nodesPerQuery) DB-->>KCtx: ScoredNodes KCtx-->>CLI: *ProjectContext, nil (always nil error)Comments Outside Diff (1)
internal/migration/upgrade_test.go, line 122-126 (link)The test checks for
ask.mdin the namespace directory, butaskwas removed fromSlashCommandsin this PR. The new canonical command set is{plan, next, done, context}, so the generated files will beplan.md,next.md,done.md, andcontext.md— neverask.md. This test will always fail in CI.Prompt To Fix With AI
Prompt To Fix All With AI
Last reviewed commit: "fix: address remaini..."