Skip to content

[DNM] feat: delete go-header store cmd#3040

Open
julienrbrt wants to merge 28 commits intomainfrom
julien/go-header-clean
Open

[DNM] feat: delete go-header store cmd#3040
julienrbrt wants to merge 28 commits intomainfrom
julien/go-header-clean

Conversation

@julienrbrt
Copy link
Member

@julienrbrt julienrbrt commented Feb 2, 2026

Overview

evm cleanup-goheader

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedFeb 6, 2026, 11:04 AM

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @julienrbrt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new command-line utility designed to clean up obsolete data from the application's database. The data, previously duplicated by the go-header library for P2P synchronization, is now redundant due to a recent architectural shift towards a unified data storage mechanism. The new command provides a safe and controlled way to reclaim disk space by removing these legacy entries, including a dry-run option for verification before irreversible changes are made.

Highlights

  • New Cleanup Command: A new command, cleanup-goheader, has been introduced to remove legacy go-header store data from the database.
  • Data Deletion Target: This command specifically targets and deletes data stored under the /headerSync and /dataSync prefixes, which are no longer needed after a migration to a unified store approach.
  • Dry Run Functionality: The command includes a --dry-run flag, allowing users to preview which keys would be deleted without actually performing the deletion, enhancing safety.
  • Integration: The new cleanup-goheader command is integrated into the main application's command-line interface.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link
Contributor

claude bot commented Feb 2, 2026

Claude finished @julienrbrt's task in 1m 57s —— View job


Code Review Complete

I've performed a comprehensive review of the new cleanup-goheader command. Here are my findings across the six focus areas:


1. Code Quality & Architecture

✅ Positive Aspects

  • Clean separation of concerns: The tool is properly isolated in tools/db-cleaner/ as a standalone module
  • Idiomatic Go: Code follows Go conventions with proper error wrapping using %w
  • Clear purpose: Single-purpose utility for a specific cleanup task

⚠️ Issues & Improvements

Critical: Command name mismatch

Location: cleanup_goheader.go:52

Use:   "clean-evm",
  • The command is named clean-evm but the function and PR description suggest it should be cleanup-goheader or clean-goheader
  • This creates confusion - the tool is about cleaning go-header data, not EVM data
  • Recommendation: Change to Use: "cleanup-goheader" to match the purpose

Medium: Potential memory issues with large datasets

Location: cleanup_goheader.go:123-159 (deletePrefix function)

Gemini bot correctly identified this. The current implementation loads all delete operations into a single batch:

for result := range results.Next() {
    // ... adds all deletes to batch ...
}
if !dryRun && count > 0 {
    if err := batch.Commit(ctx); err != nil {
        // commits everything at once
    }
}

Problem: If there are millions of keys under /headerSync or /dataSync, this could consume excessive memory.

Recommendation: Implement batched commits (e.g., commit every 1000-10000 keys):

const batchCommitThreshold = 10000

for result := range results.Next() {
    if result.Error != nil {
        return count, fmt.Errorf("error iterating results: %w", result.Error)
    }
    
    if !dryRun {
        if err := batch.Delete(ctx, ds.NewKey(result.Key)); err != nil {
            return count, fmt.Errorf("failed to delete key %s: %w", result.Key, err)
        }
    }
    count++
    
    // Commit in chunks to limit memory usage
    if !dryRun && count%batchCommitThreshold == 0 {
        if err := batch.Commit(ctx); err != nil {
            return count, fmt.Errorf("failed to commit batch delete: %w", err)
        }
        // Create new batch for next chunk
        batch, err = pdb.Batch(ctx)
        if err != nil {
            return count, fmt.Errorf("failed to create new batch: %w", err)
        }
    }
}

Minor: Const naming clarity

Location: cleanup_goheader.go:18

const evmDbName = "evm-single"

Gemini's complaint about undefined variable was actually incorrect (the const IS defined at line 18). However, the name evmDbName is slightly misleading since this is the ev-node database name, not specifically an "EVM" database. Consider evNodeDbName for clarity.


2. Error Handling & Resilience

✅ Positive Aspects

  • Consistent error wrapping with %w for proper error chain preservation
  • Proper defer with error checking on database close
  • Context propagation throughout the call chain

⚠️ Issues & Improvements

Medium: Results iterator not closed on error

Location: cleanup_goheader.go:127-142

results, err := pdb.Query(ctx, query.Query{
    KeysOnly: true,
})
if err != nil {
    return 0, fmt.Errorf("failed to query keys with prefix %s: %w", prefix, err)
}

// No defer results.Close() here!

for result := range results.Next() {
    if result.Error != nil {
        return count, fmt.Errorf("error iterating results: %w", result.Error)
    }
    // ...
}

Problem: If an error occurs during iteration, the function returns early without closing the results iterator, potentially leaking resources.

Recommendation: Add proper cleanup:

results, err := pdb.Query(ctx, query.Query{
    KeysOnly: true,
})
if err != nil {
    return 0, fmt.Errorf("failed to query keys with prefix %s: %w", prefix, err)
}
defer results.Close() // Ensure cleanup even on early return

Medium: Batch not committed on partial success

Location: cleanup_goheader.go:152-156

If an error occurs mid-iteration after some successful deletes but before commit, those operations are lost. Consider whether partial progress should be saved or the operation should be atomic. Document the behavior.

Low: Missing context cancellation check

The delete loop could run for a long time with large datasets. Consider checking ctx.Err() periodically in the loop to allow graceful cancellation.


3. Security

✅ Positive Aspects

  • No credential exposure
  • No user input processing (prefix values are hardcoded constants)
  • Proper use of context for operation lifecycle

⚠️ Issues & Improvements

Medium: Database lock check missing

Location: cleanup_goheader.go:75

The documentation warns "Make sure the node is stopped before running this command" but there's no programmatic check to verify the database isn't in use.

Risk: If a user accidentally runs this while the node is running, it could cause:

  • Database corruption
  • Data race conditions
  • Crash of the running node

Recommendation: Add a lock file check or database lock detection before proceeding:

// Attempt to acquire exclusive lock
rawDB, err := store.NewDefaultKVStore(nodeConfig.RootDir, nodeConfig.DBPath, evmDbName)
if err != nil {
    if strings.Contains(err.Error(), "lock") || strings.Contains(err.Error(), "locked") {
        return fmt.Errorf("database is locked - ensure the node is stopped before running cleanup: %w", err)
    }
    return fmt.Errorf("failed to open database: %w", err)
}

Better yet, implement a proper lock file mechanism similar to what other database tools use.


4. Performance & Resource Efficiency

✅ Positive Aspects

  • Uses KeysOnly: true in query to avoid loading values (good optimization)
  • Batching deletes instead of individual operations

⚠️ Issues Identified

Critical: Unbounded memory usage (already covered in Architecture section)

See the batched commit recommendation in section 1.

Low: Missing progress indication

Location: cleanup_goheader.go:87-96

For large datasets, users have no feedback during a long-running operation. Consider adding periodic progress updates:

const progressInterval = 10000
for result := range results.Next() {
    // ... existing code ...
    count++
    if count%progressInterval == 0 {
        cmd.Printf("Processed %d keys...\n", count)
    }
}

5. Testing & Reliability

❌ Critical Issue: No tests

Problem: The entire tools/db-cleaner/ directory has no test files.

Required tests:

  1. Unit tests for deletePrefix:

    • Test with empty database
    • Test with small number of keys
    • Test with keys that should NOT be deleted (wrong prefix)
    • Test dry-run mode
    • Test error handling (database errors, iteration errors)
  2. Integration tests:

    • Create a test database with legacy go-header data
    • Run the cleanup command
    • Verify data is deleted
    • Verify other data is untouched
    • Test with large number of keys (performance)
  3. Edge cases:

    • Database doesn't exist
    • Database is locked
    • Partial deletion (what if only half the keys can be deleted?)
    • Context cancellation during operation

Example test structure:

func TestDeletePrefix(t *testing.T) {
    ctx := context.Background()
    db, err := store.NewTestInMemoryKVStore()
    require.NoError(t, err)
    defer db.Close()
    
    evDB := store.NewEvNodeKVStore(db)
    
    // Setup: add test data
    // ... populate with headerSync and dataSync keys ...
    // ... populate with other keys that should NOT be deleted ...
    
    // Test deletion
    count, err := deletePrefix(ctx, evDB, headerSyncPrefix, false)
    require.NoError(t, err)
    assert.Equal(t, expectedCount, count)
    
    // Verify keys are gone
    // Verify other keys remain
}

Recommendation: Add comprehensive test coverage before merging. This is especially critical for a data deletion tool.


6. Documentation & Developer Experience

✅ Positive Aspects

  • Good inline comments explaining the purpose
  • Clear warning message about stopping the node
  • Helpful dry-run flag

⚠️ Issues & Improvements

Medium: Missing README

Location: tools/db-cleaner/

A standalone tool in a separate module should have a README explaining:

  • What the tool does
  • When to use it
  • How to build it
  • How to run it
  • Safety considerations
  • Examples

Recommendation: Add tools/db-cleaner/README.md:

# DB Cleaner

Tool for cleaning up legacy data from the ev-node database.

## cleanup-goheader

Removes legacy go-header store data that was duplicated before the unified store migration.

### Safety

**IMPORTANT**: Stop the node before running this command. Running it while the node is active can cause database corruption.

### Usage

Build the tool:
```bash
cd tools/db-cleaner
go build -o db-cleaner

Dry run (preview what would be deleted):

./db-cleaner cleanup-goheader --home /path/to/node --dry-run

Delete legacy data:

./db-cleaner cleanup-goheader --home /path/to/node

What it deletes

  • /headerSync/* - Legacy header sync data
  • /dataSync/* - Legacy data sync data

This data is safe to delete after the unified store migration.


#### Low: Command output could be more informative
**Location**: `cleanup_goheader.go:100-110`

Current output:

Successfully deleted 15000 keys (10000 headerSync, 5000 dataSync)


Consider adding:
- Time taken
- Disk space freed (if feasible)
- Clear confirmation this is safe and expected

#### Minor: Missing Makefile/build instructions
Add an entry to the root Makefile or document how to build this tool.

---

## Summary

### Critical Issues (Must Fix)
1. ❌ **No tests** - Add comprehensive test coverage for a deletion tool
2. ⚠️ **Unbounded memory usage** - Implement batched commits for large datasets
3. ⚠️ **Resource leak** - Add `defer results.Close()` in deletePrefix

### High Priority (Should Fix)
4. ⚠️ **No database lock check** - Prevent running while node is active
5. ⚠️ **Command name mismatch** - Rename from "clean-evm" to "cleanup-goheader"

### Medium Priority (Nice to Have)
6. 📝 **Missing README** - Document the tool's purpose and usage
7. 🔄 **No progress indication** - Add periodic progress updates for UX
8. 📦 **Batch error handling** - Clarify behavior on partial failures

### Low Priority (Optional)
9. 🧹 Minor naming improvements (evmDbName → evNodeDbName)
10. ⏱️ Add context cancellation checks in long-running loop

---

## Gemini Bot Review Assessment

The Gemini bot identified 2 issues:
1. ✅ **Correct**: Memory issue with large batches (confirmed and expanded upon)
2. ❌ **Incorrect**: "undefined `evmDbName`" - The variable IS defined at line 18 as a const

---

## Recommendations Before Merge

Since this PR is marked "[DNM]" (Do Not Merge), here's what should be addressed:

**Blockers**:
- Add test coverage (at minimum, basic unit tests for deletePrefix)
- Fix resource leak (defer results.Close)
- Implement batched commits for memory efficiency
- Fix command naming inconsistency

**Strongly Recommended**:
- Add database lock check
- Add README documentation
- Add progress indication for user feedback

---

**Todo List:**
- [x] Read and analyze all changed files
- [x] Review code quality & architecture
- [x] Review error handling & resilience
- [x] Review security considerations
- [x] Review performance & resource efficiency  
- [x] Review testing & reliability
- [x] Review documentation & developer experience
- [x] Compile and post comprehensive feedback

---

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a cleanup-goheader command to remove legacy data from the store. The implementation is straightforward, but I've found a critical issue that will prevent compilation due to an undefined variable. I've also pointed out a potential memory issue when deleting a large number of keys and suggested an improvement for robustness. Once these are addressed, the command should work as intended.

}

// Open the database
rawDB, err := store.NewDefaultKVStore(nodeConfig.RootDir, nodeConfig.DBPath, evmDbName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The variable evmDbName is not defined in this file or package, which will cause a compilation error. It should be defined, likely as a constant, representing the name of the EVM database.

Comment on lines 98 to 134
func deletePrefix(ctx context.Context, db ds.Batching, prefix string, dryRun bool) (int, error) {
results, err := db.Query(ctx, dsq.Query{
Prefix: prefix,
KeysOnly: true,
})
if err != nil {
return 0, fmt.Errorf("failed to query keys with prefix %s: %w", prefix, err)
}
defer results.Close()

count := 0
batch, err := db.Batch(ctx)
if err != nil {
return 0, fmt.Errorf("failed to create batch: %w", err)
}

for result := range results.Next() {
if result.Error != nil {
return count, fmt.Errorf("error iterating results: %w", result.Error)
}

if !dryRun {
if err := batch.Delete(ctx, ds.NewKey(result.Key)); err != nil {
return count, fmt.Errorf("failed to delete key %s: %w", result.Key, err)
}
}
count++
}

if !dryRun && count > 0 {
if err := batch.Commit(ctx); err != nil {
return count, fmt.Errorf("failed to commit batch delete: %w", err)
}
}

return count, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function loads all delete operations for a given prefix into a single batch. If there are a very large number of keys to delete, this could lead to high memory consumption. Consider processing the deletions in smaller batches (e.g., committing every 1000 keys) to limit memory usage and make the operation more robust against large datasets.

@codecov
Copy link

codecov bot commented Feb 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.61%. Comparing base (c2afe0c) to head (7472bd7).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3040      +/-   ##
==========================================
- Coverage   61.66%   61.61%   -0.05%     
==========================================
  Files         111      111              
  Lines       11120    11120              
==========================================
- Hits         6857     6852       -5     
- Misses       3525     3530       +5     
  Partials      738      738              
Flag Coverage Δ
combined 61.61% <ø> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Base automatically changed from julien/simplify-store to main February 2, 2026 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant