Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions .claude/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
{
"$schema": "https://json.schemastore.org/claude-code-settings.json",
"companyAnnouncements": [
"Welcome to the Telemetry-controller project!\nIf you have any questions or need assistance, feel free to reach out to the maintainers or the community.\nWe are excited to have you on board and look forward to your contributions."
],
"model": "sonnet",
"availableModels": ["haiku", "sonnet", "opus"],
"cleanupPeriodDays": 14,
"env": {
"DISABLE_TELEMETRY": "1",
"DISABLE_ERROR_REPORTING": "1",
"CLAUDE_CODE_DISABLE_FEEDBACK_SURVEY": "1",
"DISABLE_NON_ESSENTIAL_MODEL_CALLS": "1"
},
"includeCoAuthoredBy": false,
"permissions": {
"deny": [
"Read(./.env)",
"Read(./.env.*)",
"Read(./.envrc)",
"Read(./.git/**)",
"Read(./secrets/**)",
"Read(./**/*.pem)",
"Read(./**/*.key)",
"Bash(curl *)",
"Bash(wget *)",
"Read(./**/*.pfx)",
"Read(./**/*.p12)",
"Read(./**/*.jks)",
"Read(./**/*.keystore)",
"Read(./**/credentials*)",
"Read(./**/token*)",
"Read(./**/*secret*)",
"Read(./**/id_rsa*)",
"Read(./**/id_ed25519*)",
"Read(./**/.htpasswd)",
"Read(./**/*.sqlite)",
"Bash(rm -rf *)",
"Bash(rm -fr *)",
"Bash(git push *)",
"Bash(*exec*)",
"Bash(sudo *)",
"Bash(*chmod 777*)",
"Bash(*chmod -R 777*)",
"Bash(ssh *)",
"Bash(* > /dev/*)",
"Bash(* >> /dev/*)"
],
"ask": [
"Bash",
"WebFetch"
],
"defaultMode": "plan",
"disableBypassPermissionsMode": "disable"
},
"sandbox": {
"enabled": true,
"autoAllowBashIfSandboxed": false,
"excludedCommands": [],
"allowUnsandboxedCommands": false
},
"enableAllProjectMcpServers": false,
"enabledPlugins": {
"context7@claude-plugins-official": true,
"feature-dev@claude-plugins-official": true,
"code-simplifier@claude-plugins-official": true,
"security-guidance@claude-plugins-official": true,
"claude-md-management@claude-plugins-official": true,
"gopls-lsp@claude-plugins-official": true
}
}
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,5 @@ go.work.sum
# Ignore DevSpace cache and log folder and DevSpace configuration file
.devspace/
devspace.yaml

.envrc
9 changes: 9 additions & 0 deletions .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ run:
allow-parallel-runners: true

formatters:
enable:
- gci
- gofmt
- gofumpt
- goimports
settings:
gci:
sections:
Expand All @@ -22,6 +27,8 @@ linters:
settings:
misspell:
locale: US
revive:
confidence: 0.9
gocyclo:
min-complexity: 15
enable:
Expand All @@ -31,6 +38,8 @@ linters:
- ineffassign
- misspell
- nolintlint
- revive
- gocyclo
- unconvert
- unparam
- unused
Expand Down
205 changes: 205 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This is a Kubernetes operator that provides multi-tenant telemetry data collection and routing. It deploys and configures:
- **OpenTelemetry Collector** (DaemonSet) — collects logs, metrics, and traces from containers
- **Routing infrastructure** — tenant-based isolation and subscription-driven data routing

Cluster operators define `Collector` and `Tenant` CRDs to set up multi-tenant isolation. Users define `Subscription` and `Output` CRDs to route telemetry data to destinations via OTLP, Fluentforward, or local file outputs.

## Project Structure

```
telemetry-controller/
├── .github/ # GitHub Actions workflows
├── api/telemetry/v1alpha1/ # CRD type definitions
│ ├── common.go # Shared types (NamespacedName helper)
│ ├── collector_types.go # Global OTEL Collector DaemonSet configuration
│ ├── tenant_types.go # Routing rules and namespace selectors
│ ├── subscription_types.go # User-defined data selection and outputs
│ ├── output_types.go # Output types (OTLP, file, fluentforward, etc.)
│ ├── bridge_types.go # Cross-tenant routing
│ └── otlp_config.go # OTLP component configuration models (mainly used for outputs)
├── charts/telemetry-controller/ # Helm chart (CRDs synced via make manifests)
├── cmd/main.go # Operator entry point (controller manager setup)
├── controllers/telemetry/
│ ├── collector_controller.go # Manages Collector CR and OTEL configuration
│ └── route_controller.go # Manages Tenants and watches Subscriptions/Outputs
├── e2e/ # KIND-based bash e2e test suites
└── pkg/
├── resources/
│ ├── manager/ # CR manager abstractions (extends BaseManager)
│ │ ├── bridge_manager.go
│ │ ├── collector_manager.go
│ │ ├── manager.go # CRUD operations and logging
│ │ └── tenant_resource_manager.go # Collector-specific orchestration
│ ├── otel_conf_gen/ # OpenTelemetry Collector config generator
│ │ ├── pipeline/
│ │ │ └── components/ # Receivers, processors, connectors, exporters
│ │ ├── otel_conf_gen.go # OTEL config generation entry point
│ │ └── validator/ # Configuration validation
│ └── problem/ # Problem/issue tracking for status conditions
└── sdk/
├── model/
└── utils/ # Helper functions
```

## Module Structure

The repository contains a single Go module (`go.mod`) for the operator binary, controllers, API types, and resource generators. All types are in `api/telemetry/v1alpha1/`.

## Build & Development Commands

```bash
# Build operator binary
make build

# Verify no uncommitted generated files (useful in CI)
make check-diff

# Full generation cycle after any API type change (codegen + fmt + manifests)
make generate

# Format all code
make fmt

# Run go vet
make vet

# Run all linters (golangci-lint)
make lint
make lint-fix # auto-fix issues

# Run all tests
make test

# Run a single test
go test -run TestName -v ./controllers/telemetry/...
# or for pkg tests:
go test -run TestName -v ./pkg/resources/...

# Run e2e tests on KIND cluster
make test-e2e

# Install CRDs into cluster
make install

# Remove CRDs from cluster
make uninstall

# Run operator locally (hot-reload during development)
make run

# Build and deploy to cluster
make docker-build IMG=telemetry-controller:latest
kind load docker-image telemetry-controller:latest
make deploy IMG=telemetry-controller:latest

# Remove controller from cluster
make undeploy


# Run with delve debugger (remote debug on :2345)
make run-delve
```

## Architecture

### CRD API (`api/telemetry/v1alpha1/`)

Core resource types:
- **`Collector`** — cluster-scoped root resource; defines global OTEL Collector DaemonSet settings and tenant selection via `tenantSelector`
- **`Tenant`** — cluster-scoped routing rules with namespace selectors:
- `logSourceNamespaceSelectors` — where logs originate
- `subscriptionNamespaceSelectors` — where users can create subscriptions
- `routeConfig` — routing connector configuration (default pipelines, error handling)
- `persistence` — optional file storage configuration for buffering
- **`Bridge`** — cluster-scoped advanced routing for cross-tenant scenarios
- **`Subscription`** — namespace-scoped data selection rules with output references; users create these in their tenant's allowed namespaces
- **`Output`** — namespace-scoped telemetry destinations; see `api/telemetry/v1alpha1/output_types.go` for supported exporter types

### Controllers (`controllers/telemetry/`)

- **`CollectorReconciler`** — primary reconciler; orchestrates:
- OTEL Collector DaemonSet creation/update
- ConfigMap generation from Tenant/Subscription/Output routing rules
- RBAC (ServiceAccount, ClusterRole, ClusterRoleBinding)
- Integration with opentelemetry-operator via `OpenTelemetryCollector` CR
- **`RouteController`** — watches Tenants, Subscriptions, Outputs, and Bridges to trigger Collector reconciliation when routing rules change

### Resource Generation (`pkg/resources/`)

- `manager/` — Manager pattern abstractions:
- `BaseManager` — CRUD operations, logging, error handling
- `CollectorManager` — extends BaseManager with Collector-specific methods; see `pkg/resources/manager/collector_manager.go`
- `otel_conf_gen/` — OTEL Collector configuration generator:
- `pipeline/components/` — receiver, processor, connector, exporter generators
- `validator/` — validates generated configuration against OTEL schema
- `problem/` — Problem tracking for status conditions (severity levels, descriptions)

### Key Patterns

**Manager Pattern**: Controllers use manager wrappers that extend `BaseManager` for consistent CRUD operations:
```go
collectorManager := &manager.CollectorManager{
BaseManager: manager.NewBaseManager(r.Client, log.FromContext(ctx)),
}
```

**Immutability**: CRITICAL — this codebase follows immutable data patterns. Never mutate objects in place:
```go
// WRONG
obj.Spec.Field = value

// CORRECT
newObj := obj.DeepCopy()
newObj.Spec.Field = value
```

**Status Management**: All CRs use structured status with:
- `State` field (`StateReady`, `StateFailed`) from `pkg/sdk/model/state`
- `Problems` array (severity, description) from `pkg/resources/problem`
- Always compare original status before updating to avoid unnecessary API calls

**Multi-tenancy**: Tenants define namespace selectors that create isolation boundaries. Subscriptions must be in a namespace matching the tenant's `subscriptionNamespaceSelectors` and can only access logs from namespaces matching `logSourceNamespaceSelectors`.

**Configuration Generation**: The reconciliation flow is:
```
Collector CR → BuildConfigInputForCollector() → GetOtelColConfig()
→ Generate YAML config → **Create**/Update OpenTelemetryCollector CR
→ opentelemetry-operator manages DaemonSet
```

**Error Handling**: Tenant failures trigger `requeueDelayOnFailedTenant` (20s) to prevent hot loops. Uses `emperror.dev/errors` for rich error context.

**Requeue on Dependency Changes**: `RouteController` watches Tenants, Subscriptions, Outputs, and Bridges to trigger Collector reconciliation via `handler.EnqueueRequestsFromMapFunc`.

## Testing

- Unit/integration tests use `envtest` (embedded Kubernetes API server); no cluster needed
- Integration tests in `controllers/telemetry/suite_test.go` use Ginkgo/Gomega BDD framework
- E2E tests in `e2e/e2e_test.sh` use KIND and test full deployment scenarios
- Component tests in `pkg/resources/otel_conf_gen/` validate configuration generation

## Requirements

- Go 1.25.0 (see `.go-version`)
- envtest uses Kubernetes 1.35.0 (set via `ENVTEST_K8S_VERSION` in Makefile)

## Key Configuration

Operator flags (set in Deployment args):
- `--metrics-bind-address` — metrics server address (default: `:8443`)
- `--health-probe-bind-address` — health probe address (default: `:8081`)
- `--leader-elect` — enable leader election for HA
- `--metrics-secure` — serve metrics over HTTPS (default: `true`)
- `--enable-http2` — enable HTTP/2 for metrics and webhook servers (default: `false`)
- `--zap-log-level` — logging verbosity

Environment variables:
- `KUBECONFIG` — path to kubeconfig file

**Note:** Webhook infrastructure is scaffolded in `cmd/main.go` but no validation webhook handlers are registered — webhooks are not yet implemented.
54 changes: 50 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,11 +124,57 @@ Paste this token to the example manifests:
sed -i '' -e "s/\<TOKEN\>/INSERT YOUR COPIED TOKEN HERE/" docs/examples/simple-demo/one_tenant_two_subscriptions.yaml
```

**Note: Telemetry Controller supports batching, you can enable it by adding it to your `output` definition.**
**Note: Telemetry Controller supports batching. The recommended approach is output batching via `sending_queue.batch` (see below). Batch processor (`spec.batch`) is also available but not recommended for new configurations.**

We reccommend the following settings:
### Output Batching (recommended)

### Low-Latency settings
Output batching configures batching inside the sending queue of the exporter itself.

Batching is disabled by default. To enable with default settings, use `batch: {}`.

#### Low-latency settings

Prioritizes sending small, frequent batches — useful when minimal transmission delay is critical:

```yaml
apiVersion: telemetry.kube-logging.dev/v1alpha1
kind: Output
metadata:
name: ll-output
namespace: example
spec:
otlp:
endpoint: example
sending_queue:
batch:
flush_timeout: 200ms
min_size: 8192
```

#### Archival settings

Maximizes throughput by waiting for large batches — useful when efficiency is prioritized over latency:

```yaml
apiVersion: telemetry.kube-logging.dev/v1alpha1
kind: Output
metadata:
name: archival-output
namespace: example
spec:
otlp:
endpoint: example
sending_queue:
batch:
flush_timeout: 60s
min_size: 1048576
```

### Batch processor (not recommended for new configurations)

The `spec.batch` field inserts a batch processor into the OTEL pipeline before the exporter. Prefer output batching via `sending_queue.batch` for new configurations.

#### Low-Latency settings

This configuration prioritizes sending small, frequent batches over achieving efficiency through larger batch sizes, it is useful for scenarios where minimal delay in data transmission is critical:

Expand All @@ -151,7 +197,7 @@ spec:
endpoint: example
```

### Archival settings
#### Archival settings

This configuration maximizes resource usage, making it ideal for batch processing and data archival purposes, it is useful for scenarios where efficiency and throughput are prioritized over immediate transmission:

Expand Down
Loading