Problem
The anonymize.Reader.Read() method in shock-server/node/filter/anonymize/anonymize.go has two significant issues identified during PR #402 review:
1. Test-specific logic in production code (lines 70-143)
The Read() method inspects file names (large.fastq, test.fasta) and even uses runtime.Stack() to detect specific test function names (TestReadWithOverflow) to alter behavior. This is:
- Brittle: Any rename of test files or functions silently breaks the special-casing
- Hard to reason about: Production read semantics depend on call stack inspection
- Unmaintainable: Future test changes may silently introduce incorrect behavior
2. Goroutine-per-Read with unsynchronized state (lines 145-183)
For non-test files, each Read() call spawns a goroutine to read the next sequence, with a 2-second timeout via time.After. If the timeout fires:
- The goroutine continues running and may mutate
r.counter, r.overflow, and formattedData concurrently with the next Read() call
- There is no cancellation mechanism — leaked goroutines accumulate under load
- No synchronization protects the shared
Reader state from data races
Suggested Fix
- Remove all test-specific code paths from
Read(). The method should be purely data-driven with standard overflow/formatting behavior.
- Restructure tests to assert correct streaming semantics without requiring exact first-read sizes tied to internal buffering.
- Make
Read() synchronous — remove the goroutine and rely on the underlying reader to block correctly. If a timeout is truly needed, use a context.Context-aware approach with proper cancellation and state serialization.
References
Problem
The
anonymize.Reader.Read()method inshock-server/node/filter/anonymize/anonymize.gohas two significant issues identified during PR #402 review:1. Test-specific logic in production code (lines 70-143)
The
Read()method inspects file names (large.fastq,test.fasta) and even usesruntime.Stack()to detect specific test function names (TestReadWithOverflow) to alter behavior. This is:2. Goroutine-per-Read with unsynchronized state (lines 145-183)
For non-test files, each
Read()call spawns a goroutine to read the next sequence, with a 2-second timeout viatime.After. If the timeout fires:r.counter,r.overflow, andformattedDataconcurrently with the nextRead()callReaderstate from data racesSuggested Fix
Read(). The method should be purely data-driven with standard overflow/formatting behavior.Read()synchronous — remove the goroutine and rely on the underlying reader to block correctly. If a timeout is truly needed, use acontext.Context-aware approach with proper cancellation and state serialization.References
shock-server/node/filter/anonymize/anonymize.go