Draft
Conversation
Define proto files for the TemporalFS archetype following the activity archetype pattern: - state.proto: FilesystemState, FilesystemConfig, FSStats, FilesystemStatus enum - tasks.proto: ChunkGCTask, ManifestCompactTask, QuotaCheckTask - request_response.proto: Request/response types for all FS operations - service.proto: TemporalFSService gRPC service with routing annotations Generated Go bindings in chasm/lib/temporalfs/gen/temporalfspb/.
Implement the TemporalFS archetype following the activity pattern: - filesystem.go: Root component with lifecycle state and search attributes - statemachine.go: State transitions (Create, Archive, Delete) - library.go: CHASM library registration with component and tasks - config.go: Dynamic config and default filesystem configuration - search_attributes.go: FilesystemStatus search attribute - handler.go: gRPC handler with CreateFilesystem, GetFilesystemInfo, ArchiveFilesystem implemented; FS operations stubbed - tasks.go: ChunkGC, ManifestCompact, QuotaCheck task executors (stubs) - fx.go: FX module for history service wiring - errors.go: Shared error definitions Wire TemporalFS HistoryModule into service/history/fx.go.
Define the pluggable FSStoreProvider interface — the sole extension point for SaaS to provide a WalkerStore implementation. Includes FSStore and FSBatch interfaces for key-value operations. Provide InMemoryStoreProvider as a development/testing placeholder. The production OSS implementation will use PebbleDB once the temporal-fs module is integrated as a dependency. Wire FSStoreProvider into the FX module with InMemoryStoreProvider as default.
Replace InMemoryStoreProvider with PebbleStoreProvider backed by temporal-fs. The handler now uses temporal-fs APIs for Getattr (StatByID), ReadChunks (ReadAtByID), WriteChunks (WriteAtByID), CreateSnapshot, and CreateFilesystem (tfs.Create). Operations requiring inode-based directory access (Lookup, ReadDir, Mkdir, etc.) remain stubbed until temporal-fs exposes those APIs.
ChunkGC executor now opens the FS store and runs f.RunGC() to process tombstones and delete orphaned chunks, then reschedules itself. QuotaCheck executor reads FS metrics to update stats and warns when size quota is exceeded. ManifestCompact remains a placeholder since compaction operates at the PebbleDB shard level.
Tests cover: - Filesystem lifecycle state (RUNNING/ARCHIVED/DELETED mapping) - Terminate sets status to DELETED - SearchAttributes returns status attribute - StateMachineState with nil and non-nil state - TransitionCreate with custom config, defaults, and zero GC interval - TransitionArchive/Delete with valid and invalid source states
Tests cover: - ChunkGC/ManifestCompact/QuotaCheck Validate (RUNNING only) - ChunkGC Execute with real PebbleDB store and GC rescheduling - ChunkGC Execute with zero interval (no rescheduling) - QuotaCheck Execute initializes stats - FS metrics tracking on open instance
Tests cover: - openFS/createFS helpers with real PebbleDB - createFS default chunk size fallback - inodeToAttr conversion - mapFSError nil passthrough - Getattr on root inode - ReadChunks/WriteChunks round-trip - CreateSnapshot returns valid txn ID - All stubbed methods return errNotImplemented
Tests cover: - Full lifecycle: create FS → write → read → getattr → snapshot - PebbleStoreProvider partition isolation across filesystems - PebbleStoreProvider Close releases all resources
Documents the internal architecture following the Scheduler archetype pattern in docs/architecture/. Covers component tree, state machine, tasks, pluggable storage (FSStoreProvider), gRPC service RPCs, FX wiring, and configuration defaults.
Switch from local replace directive to the published v1.0.0 release at github.com/temporalio/temporal-fs.
These planning documents (1-pager, PRD, design doc) should remain as local working files, not committed to the repository.
The per-shard PebbleDB map was a leaky abstraction from the SaaS per-shard Walker model. Since all handler operations use shardID=0 and PrefixedStore already provides key isolation between filesystem executions, a single PebbleDB instance is sufficient.
Wire all 14 stub methods to temporal-fs ByID APIs: Lookup, Setattr, Truncate, Mkdir, Unlink, Rmdir, Rename, ReadDir, Link, Symlink, Readlink, CreateFile, Mknod, Statfs. Add proper mapFSError with full error mapping to gRPC service errors. Remove errNotImplemented.
Replace TestStubsReturnNotImplemented with 15 real tests covering Lookup, Setattr, Truncate, Mkdir, Unlink, Rmdir, Rename, ReadDir, Link, Symlink, Readlink, CreateFile, Mknod, and Statfs handler methods.
- All 14 previously stubbed RPCs are now implemented with ByID methods - Storage diagram shows CDSStoreProvider (SaaS) instead of placeholder - Add CDSStoreProvider description with link to saas-temporal CDS doc
The moedash/temporal fork's CI needs access to temporal-fs. Since moedash/temporal-fs is accessible to the fork's CI, add a replace directive to source the module from there instead of temporalio/temporal-fs.
Add replace directive in go.mod to source temporal-fs from moedash/temporal-fs. Configure git credentials and GOPRIVATE/GONOSUMCHECK in CI workflows so go mod download can fetch the private module.
Composite actions can't access secrets directly, so add a go-private-token input and pass it from all calling workflows.
PebbleStoreProvider used an in-memory counter for partition IDs that reset on restart, causing FS data to map to wrong PrefixedStore prefixes. Replace with deterministic FNV-1a hash of namespaceID+filesystemID so partition IDs are stable across restarts. Add test for cross-instance stability.
openFS and createFS leaked stores on error paths — the store was returned to callers but never closed on failure. Now both methods close the store internally on error and return only (*tfs.FS, error). Callers no longer receive the store since f.Close() handles the store lifecycle. Also add named constants for Statfs virtual capacity magic numbers and wrap store-level errors through mapFSError for consistent error mapping.
QuotaCheckTaskExecutor silently swallowed GetStore/Open errors, returning nil (success) so the task was never retried. Now returns the error for retry by the task framework. ChunkGCTaskExecutor ignored f.Close() errors. Now logs a warning.
- Fix PebbleStoreProvider description to reflect FNV-1a partition IDs - Add WAL integration section covering walEngine, stateTracker, flusher, and recovery pipeline - Clarify handler store lifecycle (openFS/createFS close on error)
- fx.go: Add fx.Lifecycle hook to close PebbleStoreProvider on shutdown, preventing PebbleDB resource leak. - statemachine.go: Nil-check FilesystemState in SetStateMachineState to prevent panic on zero-value Filesystem. - tasks.go: Log warning on f.Close() error in quotaCheckTaskExecutor (was silently ignored).
ReadDir was calling ReadDirByID then StatByID for every entry (N+1 queries). Now uses ReadDirPlusByID which returns embedded inode data from the dir_scan keys, falling back to StatByID only for hardlinked files where the inode isn't embedded. Also modernize Statfs min() usage.
ErrClosed and ErrVersionMismatch were not mapped, causing raw internal errors to leak to clients. ErrLockConflict was also unmapped. Now: - ErrLockConflict → FailedPrecondition - ErrClosed, ErrVersionMismatch → Unavailable
ChunkCount is uint64 and ChunksDeleted could exceed it (e.g., when stats drift from actual FS state or on first GC run with zero-init stats). The subtraction would wrap to a massive value, permanently corrupting the persisted CHASM stats. Now clamps to zero.
120+ research topics across science, tech, policy, and medicine domains. Five template-based markdown generators produce deterministic content for each workflow step (sources, summary, fact-check, report, review).
DemoStore wraps a shared PebbleDB with manifest management for tracking workflows. The Temporal workflow chains 5 activities (WebResearch, Summarize, FactCheck, FinalReport, PeerReview), each writing files and creating MVCC snapshots through TemporalFS. Random failures are injected per-activity with attempt-aware seeding so retries can succeed.
Runner starts N workflows via Temporal SDK with semaphore-based concurrency control and atomic stat counters. Dashboard renders a live ANSI terminal TUI at 200ms refresh with progress bar, throughput metrics, and a 12-line color-coded activity feed.
main.go provides run/report/browse subcommands. Report generates a self-contained HTML file with dark theme, stat cards, workflow table, and expandable filesystem explorer. README covers usage, demo script, architecture, and file structure.
Activities now open the FS and verify prior step's files exist BEFORE injecting failures. On retry, each activity logs the number of files from the previous step and the last snapshot name, proving TemporalFS durability across failures. Retries are counted in real-time via shared RunStats so the dashboard shows them as they happen.
Store workflow results (retries, status) in the manifest after each workflow completes. The HTML report now shows a "Retries Survived" stat card and per-workflow retry badges (yellow) and status badges (green/red) in the workflow table.
Builds the binary, starts Temporal dev server, runs workflows, lists them in Temporal, browses a filesystem, generates the HTML report, and opens it in the browser. Supports --workflows, --concurrency, --failure-rate, and --seed flags. Cleans up on exit.
- Add --continuous flag: runs workflows indefinitely until Ctrl+C, auto-opens Temporal UI, and generates HTML report on shutdown - Dashboard shows animated cycling bar with "∞" in continuous mode - Runner supports labeled break loop for graceful cancellation - Update run-demo.sh with --continuous flag support - Update README with run-demo.sh usage, continuous mode docs - Add .gitignore for demo binaries and generated artifacts
The "tfs: not found" errors were caused by stale Temporal activity tasks from previous runs being delivered to the new worker against a fresh PebbleDB. Fix by: 1. Use unique task queue per run (research-demo-<timestamp>) to isolate each demo run from previous Temporal server state 2. Pre-create all FS partitions before starting workflows to ensure superblocks exist before any activity executes 3. Simplify openFS to 2-return (removed diagnostic mutex and post-close verification that were added during investigation) 4. Remove unused createOrOpenFS method Tested: 200 workflows, 50 concurrent, failure-rate=1.0, 0 errors.
When --no-dashboard is used, nobody reads from runner.EventCh. After the buffer fills, goroutines block on channel writes and can't finish, causing wg.Wait() to hang on shutdown. Fix: spawn a goroutine to drain events when dashboard is disabled.
When the user presses Ctrl+C in continuous mode, in-flight workflows are no longer waited on. Previously these were counted as "failed" because run.Get() returned a context-cancelled error. Now we detect context cancellation and exclude them from the failure count.
Activities now emit started/retrying/completed events to the shared EventCh, giving the live dashboard real-time visibility into each workflow step. Removed --no-dashboard from run-demo.sh so the TUI shows by default during the demo.
- Pipe Temporal SDK and Go log output to <data-dir>/demo.log so the live dashboard isn't buried in log lines - Fix dashboard box drawing: add visibleLen/boxLine helpers that auto-pad lines to exact box width, ignoring ANSI escape codes - Reduce progress bar from 40 to 30 chars to fit 66-char box
- Make all EventCh sends in runOne non-blocking (select/default) to prevent goroutines from deadlocking when the channel buffer fills, which was holding semaphore slots and stopping new workflows - Set WorkflowIDConflictPolicy to TERMINATE_EXISTING so stale workflows from previous runs don't cause ExecuteWorkflow failures
Include the task queue name (which contains a per-run timestamp) in workflow IDs instead of using TERMINATE_EXISTING conflict policy. This prevents workflows from previous runs being terminated while ensuring each run's IDs are globally unique.
- Update all import paths from temporal-fs to temporal-zfs - Rename tfs import alias to tzfs throughout - Bump dependency to v1.2.0 (module path updated in upstream repo)
Replace all occurrences of "TemporalFS" with "TemporalZFS" in comments, strings, and documentation across chasm/lib/temporalfs/ and tests/. Generated protobuf code, proto definitions, package names, directory paths, and import paths are intentionally left unchanged.
Rename directories, packages, and all references: - chasm/lib/temporalfs → chasm/lib/temporalzfs - temporalfspb → temporalzfspb - docs/architecture/temporalfs.md → temporalzfs.md - tests/temporalfs_test.go → temporalzfs_test.go
- temporal-fs → temporal-zfs in docs, CI GOPRIVATE/GONOSUMCHECK - TemporalFS → TemporalZFS in architecture docs - /tmp/tfs-demo → /tmp/tzfs-demo in demo scripts and README
Proto .pb.go files had stale raw descriptors from the temporalfs→temporalzfs rename. Also fix unused loop variable in run-demo.sh (SC2034).
The research-agent-demo is an example app, not library code. Exclude it from forbidigo (time.Now), errcheck, and revive rules that apply to the chasm/lib package.
0fe6fef to
4eb21a8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed?
Introduces the TemporalFS CHASM archetype — a virtual filesystem built on top of the CHASM framework, backed by temporal-fs (PebbleDB). This PR includes:
state.proto,tasks.proto,request_response.proto,service.protowith generated Go bindings for the TemporalFS gRPC service.Filesystemroot component, state machine (Create/Archive/Delete transitions), CHASM library registration, dynamic config, search attributes, and FX wiring intoservice/history.FSStoreProviderinterface withPebbleStoreProvider(OSS default) using deterministic FNV-1a partition IDs. SaaS overrides this withCDSStoreProvider.CreateFilesystem,Getattr,Lookup,ReadDir,Mkdir,CreateFile,WriteChunks,ReadChunks,CreateSnapshot,Setattr,Truncate,Unlink,Rmdir,Rename,Link,Symlink,Readlink,Mknod,Statfs.ChunkGC(runs garbage collection and reschedules),QuotaCheck(reads FS metrics and warns on quota breach),ManifestCompact(placeholder).ReadDirPlusByID, nil-safety in state machine, FX lifecycle hook to close PebbleDB on shutdown.GOPRIVATE/GONOSUMCHECKand git credentials across all workflows for the privatetemporal-fsdependency (usingmoedash/temporal-fsfork for CI access).docs/architecture/temporalfs.md.Why?
TemporalFS provides MVCC-snapshotted, append-friendly filesystem storage as a first-class CHASM archetype. This enables use cases like AI research agents that need transactional file operations with snapshot-based time travel — activities can write files, take snapshots, and later read from any historical snapshot for reproducibility. The pluggable
FSStoreProviderallows SaaS to swap in CDS-backed storage without changing the core logic.How did you test it?
FunctionalTestBasewith CHASM-enabled Temporal server exercising the full stack (FX → PebbleStoreProvider → tfs.FS)Potential risks
temporal-fsmodule is sourced frommoedash/temporal-fsfork viago.modreplace directive. Before merging totemporalio/temporalmain, this must be switched totemporalio/temporal-fsand the CI secret (GO_PRIVATE_TOKEN) must be configured in the upstream repo.namespaceID+filesystemIDis deterministic but not collision-resistant for adversarial inputs. This is acceptable for internal partition routing but should not be exposed as a user-facing identifier.go.moddue to the newtemporal-fsdependency tree. Review for unintended transitive dependency upgrades.