Skip to content

Latest commit

 

History

History
171 lines (133 loc) · 3.82 KB

File metadata and controls

171 lines (133 loc) · 3.82 KB

Warehouse Management

Overview

Warehouses in Pangolin define storage configurations and credential management strategies for your Iceberg tables. Each warehouse represents a storage backend (S3, Azure Blob, GCS) and controls how clients access data.

Key Concepts:

  • Warehouse: Storage configuration and credential vending settings
  • Catalog: References a warehouse and defines a storage location
  • Credential Vending: Automatic provisioning of temporary credentials to clients

Creating a Warehouse

Basic Warehouse (Static Credentials)

curl -X POST http://localhost:8080/api/v1/warehouses \
  -H "X-Pangolin-Tenant: <tenant-id>" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "dev_warehouse",
    "use_sts": false,
    "storage_config": {
      "type": "s3",
      "bucket": "my-dev-bucket",
      "region": "us-east-1"
    }
  }'

Configuration:

  • use_sts: false - Clients use static credentials from their environment
  • Suitable for development and testing

Production Warehouse (STS Credential Vending)

curl -X POST http://localhost:8080/api/v1/warehouses \
  -H "X-Pangolin-Tenant: <tenant-id>" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "production_warehouse",
    "vending_strategy": {
      "type": "AwsSts",
      "role_arn": "arn:aws:iam::123456789012:role/PangolinDataAccess"
    },
    "storage_config": {
      "s3.bucket": "my-prod-bucket",
      "s3.region": "us-east-1"
    }
  }'

Configuration:

  • vending_strategy: Defines how Pangolin provisions temporary credentials (STS, SAS, OAuth).
  • Required for production environments where direct IAM access is prohibited.

Storage Backend Configuration

AWS S3

{
  "type": "s3",
  "bucket": "my-bucket",
  "region": "us-east-1",
  "role_arn": "arn:aws:iam::123456789012:role/DataAccess"
}

Azure Blob Storage

{
  "type": "azure",
  "account_name": "mystorageaccount",
  "container": "data"
}

Google Cloud Storage

{
  "type": "gcs",
  "bucket": "my-gcs-bucket",
  "project_id": "my-project"
}

MinIO (S3-Compatible)

  "type": "s3",
  "bucket": "minio-bucket",
  "endpoint": "http://minio:9000",
  "allow_http": true,
  "s3.path-style-access": "true"
}

API Endpoints

List Warehouses

GET /api/v1/warehouses

Headers:

  • Authorization: Bearer <token>
  • X-Pangolin-Tenant: <Tenant-ID>

Create Warehouse

POST /api/v1/warehouses

Body (with STS):

{
  "name": "main_warehouse",
  "use_sts": true,
  "storage_config": {
    "type": "s3",
    "bucket": "my-bucket",
    "region": "us-east-1",
    "role_arn": "arn:aws:iam::123456789012:role/PangolinRole"
  }
}

Get Warehouse

GET /api/v1/warehouses/{name}


Catalog Association

After creating a warehouse, create catalogs that reference it:

curl -X POST http://localhost:8080/api/v1/catalogs \
  -H "X-Pangolin-Tenant: <tenant-id>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "analytics",
    "warehouse_name": "main_warehouse",
    "storage_location": "s3://my-bucket/analytics"
  }'

Best Practices

  1. Use STS in Production: Set use_sts: true for production warehouses
  2. Static Credentials for Development: Use use_sts: false for local development
  3. Separate Warehouses by Environment: Create different warehouses for dev, staging, production
  4. Scope Storage Locations: Use catalog's storage_location to organize data

Related Documentation