This guide covers strategies for scaling Pangolin to handle growing data volumes, user bases, and query loads.
When to Scale Up
- Single-tenant deployments with predictable growth
- Cost-effective for small to medium workloads
- Simpler operational model
Limits
- Hardware constraints (CPU, RAM, disk)
- Single point of failure
- Downtime required for upgrades
When to Scale Out
- Multi-tenant deployments
- Unpredictable or bursty workloads
- High availability requirements
- Large-scale deployments (100+ users, 1000+ tables)
Pangolin API servers are stateless, enabling easy horizontal scaling.
Load Balancing Strategy
┌──────────────┐
│ Load Balancer│
└──────┬───────┘
│
┌───┴────┬────────┬────────┐
│ │ │ │
┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐
│ API │ │ API │ │ API │ │ API │
│ 1 │ │ 2 │ │ 3 │ │ 4 │
└──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘
│ │ │ │
└────────┴────────┴────────┘
│
┌──────▼──────┐
│ PostgreSQL │
└─────────────┘
Scaling Triggers
- CPU utilization > 70%
- Request latency p95 > 500ms
- Request queue depth > 100
Auto-Scaling Configuration (Kubernetes)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: pangolin-api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: pangolin-api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80Read Replicas
┌─────────────┐
│ Primary │ ◄── Writes
└──────┬──────┘
│ Replication
┌───┴────┬────────┐
│ │ │
┌──▼──┐ ┌──▼──┐ ┌──▼──┐
│Read │ │Read │ │Read │ ◄── Reads
│Rep 1│ │Rep 2│ │Rep 3│
└─────┘ └─────┘ └─────┘
Connection Pooling
# PgBouncer configuration
[databases]
pangolin = host=localhost port=5432 dbname=pangolin
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
reserve_pool_size = 5Partitioning Strategy
-- Partition audit logs by month
CREATE TABLE audit_logs (
id UUID PRIMARY KEY,
timestamp TIMESTAMP NOT NULL,
action VARCHAR(50),
...
) PARTITION BY RANGE (timestamp);
CREATE TABLE audit_logs_2025_01 PARTITION OF audit_logs
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');Sharding
// Shard by tenant_id for multi-tenancy
sh.shardCollection("pangolin.assets", { tenant_id: 1, catalog: 1 })
// Shard by catalog for single-tenant
sh.shardCollection("pangolin.assets", { catalog: 1, namespace: 1 })Shared Infrastructure (Cost-Effective)
- All tenants share same API servers and database
- Logical isolation via tenant_id
- Suitable for: 100s-1000s of small tenants
Dedicated Infrastructure (High Isolation)
- Each tenant gets dedicated resources
- Physical isolation
- Suitable for: Enterprise customers, compliance requirements
Hybrid Approach (Recommended)
- Shared API servers
- Tenant-specific database schemas or databases
- Balance of cost and isolation
-- Monitor per-tenant resource usage
SELECT
tenant_id,
COUNT(*) as table_count,
SUM(size_bytes) as total_size
FROM assets
GROUP BY tenant_id
ORDER BY total_size DESC;Flat Structure (Small Scale)
catalog
├── namespace1
├── namespace2
└── namespace3
Hierarchical Structure (Large Scale)
catalog
├── department
│ ├── team1
│ │ ├── project_a
│ │ └── project_b
│ └── team2
└── shared
Best Practices
- Limit namespace depth to 3-4 levels
- Use consistent naming conventions
- Implement namespace quotas
Anti-Pattern: Single Mega-Catalog
❌ production (10,000 tables)
Best Practice: Multiple Catalogs
✅ sales_analytics (500 tables)
✅ marketing_data (300 tables)
✅ finance_reports (200 tables)
✅ operations_metrics (400 tables)
Audit Log Archival
-- Archive logs older than 90 days
INSERT INTO audit_logs_archive
SELECT * FROM audit_logs
WHERE timestamp < NOW() - INTERVAL '90 days';
DELETE FROM audit_logs
WHERE timestamp < NOW() - INTERVAL '90 days';Snapshot Cleanup
# Remove snapshots older than retention policy
# Implement via maintenance operations
pangolin-admin optimize-catalog --catalog production \
--remove-old-snapshots --retention-days 30Essential Indexes
-- Catalog operations
CREATE INDEX idx_assets_catalog_namespace ON assets(catalog_id, namespace);
CREATE INDEX idx_assets_tenant ON assets(tenant_id);
-- Search and discovery
CREATE INDEX idx_assets_name ON assets(name);
CREATE INDEX idx_assets_kind ON assets(kind);
-- Audit queries
CREATE INDEX idx_audit_tenant_time ON audit_logs(tenant_id, timestamp DESC);
CREATE INDEX idx_audit_action ON audit_logs(action);Redis Configuration
# Cache frequently accessed metadata
cache:
type: redis
host: redis-cluster
ttl: 300 # 5 minutes
# Cache keys
patterns:
- "catalog:*:metadata"
- "namespace:*:list"
- "permissions:user:*"Cache Invalidation
- Invalidate on write operations
- Use pub/sub for multi-instance coordination
- Implement cache warming for critical data
# Cache expensive queries
@cache(ttl=600) # 10 minutes
def get_catalog_summary(catalog_id):
return {
'table_count': count_tables(catalog_id),
'total_size': calculate_size(catalog_id),
'namespaces': list_namespaces(catalog_id)
}Prefix Distribution
# Anti-pattern: Sequential prefixes
s3://bucket/table-00001/
s3://bucket/table-00002/
s3://bucket/table-00003/
# Best practice: Hash-based prefixes
s3://bucket/a7f3/table-00001/
s3://bucket/b2e9/table-00002/
s3://bucket/c4d1/table-00003/
Request Rate Limits
- S3: 3,500 PUT/COPY/POST/DELETE, 5,500 GET/HEAD per prefix per second
- Use multiple prefixes for high-throughput workloads
Primary Region: us-east-1
├── Catalog metadata: RDS us-east-1
└── Table data: S3 us-east-1
Secondary Region: eu-west-1
├── Read replica: RDS eu-west-1
└── Replicated data: S3 eu-west-1
API Response Times
- List catalogs: < 100ms (p95)
- Get table metadata: < 50ms (p95)
- Create table: < 200ms (p95)
- Search: < 500ms (p95)
Throughput
- 1,000 requests/second per API instance
- 10,000 concurrent users per deployment
Scale Limits
- 100,000 tables per catalog
- 1,000 catalogs per deployment
- 10,000 users per tenant
-- Track growth trends
SELECT
DATE_TRUNC('day', created_at) as date,
COUNT(*) as new_tables,
SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('day', created_at)) as cumulative
FROM assets
GROUP BY DATE_TRUNC('day', created_at)
ORDER BY date DESC
LIMIT 30;Forecast Resource Needs
- Monitor growth rate (tables, users, queries)
- Project 6-12 months ahead
- Plan scaling actions proactively
Alerting Thresholds
- Database connections > 80% of pool
- Disk usage > 70%
- API latency p95 > 500ms
- Error rate > 1%
Right-Sizing
- Start small, scale based on actual usage
- Use auto-scaling to handle peaks
- Scale down during off-hours
Storage Tiering
# Move old data to cheaper storage
aws s3 cp s3://hot-bucket/old-data/ s3://archive-bucket/old-data/ \
--storage-class GLACIER_IRShared Resources
- API servers: Shared across all tenants
- Database: Shared with logical isolation
- Monitoring: Centralized
Dedicated Resources (Premium Tier)
- Dedicated API instances
- Isolated database
- Priority support
- Identify bottleneck (CPU, memory, database, network)
- Review current resource utilization
- Analyze query patterns and hot spots
- Test scaling strategy in staging
- Monitor metrics continuously
- Validate health checks
- Test failover scenarios
- Verify data consistency
- Measure performance improvement
- Update capacity planning
- Document changes
- Review cost impact