Contributing to BatchFlow

Thank you for your interest in contributing to BatchFlow! This document provides guidelines and information for contributors.

🚀 Getting Started

Prerequisites

Go 1.20 or later
Docker and Docker Compose (for integration tests)
Git

Development Setup

Fork and Clone

git clone https://github.com/rushairer/batchflow.git
cd batchflow

Install Dependencies
```
go mod download
```
Verify Setup
```
make test-unit
make lint
```

📋 Development Workflow

1. Create a Branch

git checkout -b feature/your-feature-name
# or
git checkout -b fix/issue-number

2. Make Changes

Write clean, well-documented code
Follow Go best practices and project conventions
Add tests for new functionality
Update documentation as needed

3. Test Your Changes

# Run unit tests
make test-unit

# Run linting
make lint

# Run integration tests (optional but recommended)
make docker-sqlite-test
make docker-mysql-test
make docker-postgres-test
make docker-redis-test

4. Commit Changes

git add .
git commit -m "feat: add new feature description"
# or
git commit -m "fix: resolve issue description"

Commit Message Format:

feat: - New features
fix: - Bug fixes
docs: - Documentation changes
test: - Test additions or modifications
refactor: - Code refactoring
perf: - Performance improvements
chore: - Maintenance tasks

5. Push and Create PR

git push origin your-branch-name

Then create a Pull Request on GitHub.

🧪 Testing Guidelines

Unit Tests

Write tests for all new functions and methods
Aim for at least 80% code coverage
Use table-driven tests where appropriate
Mock external dependencies

Example:

func TestBatchFlow_Submit(t *testing.T) {
    tests := []struct {
        name    string
        request *Request
        wantErr bool
    }{
        {
            name:    "valid request",
            request: NewRequest(schema).SetString("name", "test"),
            wantErr: false,
        },
        // Add more test cases
    }
    
    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            // Test implementation
        })
    }
}

Integration Tests

Test real database interactions
Verify performance characteristics
Test error handling and edge cases
Use Docker containers for consistent environments

Performance Tests

Add benchmarks for performance-critical code
Monitor memory allocations
Test with realistic data volumes

Example:

func BenchmarkBatchFlow_Submit(b *testing.B) {
    batch, _ := NewBatchFlowWithMock(ctx, config)
    request := NewRequest(schema).SetString("name", "test")
    
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        batch.Submit(ctx, request)
    }
}

📝 Code Style Guidelines

Go Code Style

Follow standard Go formatting (go fmt)
Use meaningful variable and function names
Write clear, concise comments
Keep functions small and focused
Handle errors appropriately

Documentation

Add GoDoc comments for public functions and types
Update README.md for significant changes
Include code examples in documentation
Document configuration options and their effects

Error Handling

// Good: Specific error types
type ValidationError struct {
    Field   string
    Message string
}

func (e *ValidationError) Error() string {
    return fmt.Sprintf("validation error in field %s: %s", e.Field, e.Message)
}

// Good: Contextual error wrapping
if err := validateRequest(req); err != nil {
    return fmt.Errorf("failed to validate request: %w", err)
}

🏗️ Architecture Guidelines

基于重构后的架构设计 - 版本 v1.3.0

架构概览

BatchFlow 采用灵活的分层架构，通过统一的 BatchExecutor 接口支持不同类型的数据源：

SQL数据库: 使用 ThrottledBatchExecutor + BatchProcessor + SQLDriver
NoSQL数据库: 直接实现 BatchExecutor 接口
消息推送/API调用: 直接实现 BatchExecutor 接口，支持各种自定义批量任务
测试环境: 使用 MockExecutor 直接实现

添加新的SQL数据源支持

实现SQLDriver接口:

// drivers/newdb/driver.go
type NewDBDriver struct{}

func (d *NewDBDriver) GenerateInsertSQL(schema batchflow.SchemaInterface, data []map[string]any) (string, []any, error) {
    // 生成数据库特定的SQL语句
    // 处理冲突策略：ConflictIgnore, ConflictReplace, ConflictUpdate
    return sql, args, nil
}

创建执行器工厂:

// drivers/newdb/executor.go
func NewBatchExecutor(db *sql.DB) *batchflow.ThrottledBatchExecutor {
    return batchflow.NewSQLThrottledBatchExecutorWithDriver(db, &NewDBDriver{})
}

func NewBatchExecutorWithDriver(db *sql.DB, driver batchflow.SQLDriver) *batchflow.ThrottledBatchExecutor {
    return batchflow.NewSQLThrottledBatchExecutorWithDriver(db, driver)
}

添加BatchFlow工厂方法:

// batchflow.go
func NewNewDBBatchFlow(ctx context.Context, db *sql.DB, config PipelineConfig) *BatchFlow {
    executor := newdb.NewBatchExecutor(db)
    return NewBatchFlow(ctx, config.BufferSize, config.FlushSize, config.FlushInterval, executor)
}

添加新的NoSQL数据源支持

直接实现BatchExecutor接口:

// drivers/newnosql/executor.go
type Executor struct {
    client          *NewNoSQLClient
}

func (e *Executor) ExecuteBatch(ctx context.Context, schema batchflow.SchemaInterface, data []map[string]any) error {
    // 直接实现数据库特定的批量操作
    // 无需经过BatchProcessor层
    return nil
}

创建工厂方法:

func NewBatchExecutor(client *NewNoSQLClient) *Executor {
    return &Executor{client: client}
}

添加BatchFlow工厂方法:

func NewNewNoSQLBatchFlow(ctx context.Context, client *NewNoSQLClient, config PipelineConfig) *BatchFlow {
    executor := newnosql.NewBatchExecutor(client)
    return NewBatchFlow(ctx, config.BufferSize, config.FlushSize, config.FlushInterval, executor)
}

测试新的数据源驱动

单元测试:

func TestNewDBDriver_GenerateInsertSQL(t *testing.T) {
    driver := &NewDBDriver{}
    schema := &batchflow.Schema{
        Name: "test_table",
        Columns:   []string{"id", "name"},
        ConflictStrategy: batchflow.ConflictIgnore,
    }
    data := []map[string]any{
        {"id": 1, "name": "test"},
    }
    
    sql, args, err := driver.GenerateInsertSQL(schema, data)
    assert.NoError(t, err)
    assert.Contains(t, sql, "INSERT")
    assert.Len(t, args, 2)
}

集成测试:

func TestNewDBBatchFlow_Integration(t *testing.T) {
    db := setupTestDB(t) // 设置测试数据库
    defer db.Close()
    
    config := PipelineConfig{
        BufferSize:    100,
        FlushSize:     10,
        FlushInterval: time.Second,
    }
    batch := NewNewDBBatchFlow(ctx, db, config)
    
    // 测试批量插入
    schema := NewSQLSchema("test_table", batchflow.ConflictIgnoreOperationConfig, "id", "name")
    request := NewRequest(schema).SetInt64("id", 1).SetString("name", "test")
    
    err := batch.Submit(ctx, request)
    assert.NoError(t, err)
    
    // 验证数据插入
    // ...
}

架构最佳实践

选择合适的实现方式:
- SQL数据库：使用 ThrottledBatchExecutor 架构，复用通用逻辑
- NoSQL数据库：直接实现BatchExecutor，避免不必要的抽象
性能优化:
- 使用数据库特定的批量操作API
- 避免在热路径中进行内存分配
- 利用数据库的Pipeline或Batch特性
错误处理:
- 提供清晰的错误信息
- 区分临时错误和永久错误
- 支持错误重试机制
指标收集:
- 实现MetricsReporter接口
- 记录执行时间、批次大小、成功/失败状态
- 提供数据库特定的指标

Performance Considerations

Use pointer receivers for methods
Minimize memory allocations in hot paths
Consider using sync.Pool for frequently allocated objects
Profile code to identify bottlenecks

🐛 Bug Reports and Feature Requests

Reporting Bugs

Check existing issues first
Use the bug report template
Provide minimal reproduction case
Include environment details
Add relevant logs and error messages

Requesting Features

Use the feature request template
Explain the use case and problem
Propose a solution
Consider backwards compatibility
Discuss API design implications

🔄 Review Process

Pull Request Requirements

All tests pass
Code coverage maintained or improved
Documentation updated
No linting errors
Backwards compatibility preserved (unless breaking change is justified)

Review Criteria

Functionality: Does the code work as intended?
Performance: Are there any performance regressions?
Security: Are there any security implications?
Maintainability: Is the code easy to understand and maintain?
Testing: Are there adequate tests?
Documentation: Is the documentation clear and complete?

📊 CI/CD Pipeline

Automated Checks

Code formatting (go fmt)
Linting (golangci-lint)
Unit tests with coverage
Integration tests (MySQL, PostgreSQL, SQLite)
Performance benchmarks

Manual Testing

Test with different Go versions
Verify on different operating systems
Test with various database versions
Performance testing under load

🎯 Project Priorities

Current Focus Areas

Performance Optimization: Improving throughput and reducing latency
Error Handling: Better error messages and recovery mechanisms
Documentation: Comprehensive guides and examples
Testing: Increasing test coverage and reliability

Future Roadmap

Additional database support (TiDB, ClickHouse)
Monitoring and metrics integration
Connection pool optimization
Advanced batching strategies

🤝 Community Guidelines

Code of Conduct

Be respectful and inclusive
Provide constructive feedback
Help newcomers get started
Focus on technical merit
Maintain professional communication

Getting Help

Check existing documentation first
Search closed issues for similar problems
Ask questions in GitHub Discussions
Provide context and examples when asking for help

📚 Resources

Documentation

README.md - Project overview and basic usage
CONFIG.md - Configuration options
README-INTEGRATION-TESTS.md - Integration testing guide

Development Tools

golangci-lint - Go linting
Docker - Containerization
Make - Build automation

Learning Resources

📞 Contact

Issues: GitHub Issues
Discussions: GitHub Discussions
Security: Report security issues privately via email

Thank you for contributing to BatchFlow! Your efforts help make this project better for everyone. 🙏

FilesExpand file tree

contributing.md

Latest commit

History

contributing.md

File metadata and controls

Contributing to BatchFlow

🚀 Getting Started

Prerequisites

Development Setup

📋 Development Workflow

1. Create a Branch

2. Make Changes

3. Test Your Changes

4. Commit Changes

5. Push and Create PR

🧪 Testing Guidelines

Unit Tests

Integration Tests

Performance Tests

📝 Code Style Guidelines

Go Code Style

Documentation

Error Handling

🏗️ Architecture Guidelines

架构概览

添加新的SQL数据源支持

添加新的NoSQL数据源支持

测试新的数据源驱动

架构最佳实践

Performance Considerations

🐛 Bug Reports and Feature Requests

Reporting Bugs

Requesting Features

🔄 Review Process

Pull Request Requirements

Review Criteria

📊 CI/CD Pipeline

Automated Checks

Manual Testing

🎯 Project Priorities

Current Focus Areas

Future Roadmap

🤝 Community Guidelines

Code of Conduct

Getting Help

📚 Resources

Documentation

Development Tools

Learning Resources

📞 Contact