This guide shows how to configure Apache Iceberg clients (PyIceberg, PySpark, Trino, etc.) to connect to Pangolin.
Pangolin supports two authentication modes:
- Set
PANGOLIN_NO_AUTH=1environment variable on the server - No authentication required from clients
- Uses default tenant automatically
- Not recommended for production
- JWT tokens with tenant information
- Iceberg REST specification compliant
- Required for multi-tenant deployments
pip install "pyiceberg[s3fs,pyarrow]" pyjwtWhen the Pangolin server is running with PANGOLIN_NO_AUTH=1:
from pyiceberg.catalog import load_catalog
catalog = load_catalog(
"pangolin",
**{
"uri": "http://localhost:8080",
"prefix": "analytics", # Catalog name
"s3.endpoint": "http://localhost:9000",
"s3.access-key-id": "minioadmin",
"s3.secret-access-key": "minioadmin",
"s3.region": "us-east-1",
}
)
# List namespaces
namespaces = catalog.list_namespaces()
print(namespaces)curl -X POST http://localhost:8080/api/v1/tokens \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "00000000-0000-0000-0000-000000000001",
"username": "user@example.com",
"expires_in_hours": 24
}'Response:
{
"token": "eyJ0eXAiOiJKV1QiLCJhbGc...",
"expires_at": "2025-12-14T02:37:49+00:00",
"tenant_id": "00000000-0000-0000-0000-000000000001"
}from pyiceberg.catalog import load_catalog
# Token from /api/v1/tokens endpoint
token = "eyJ0eXAiOiJKV1QiLCJhbGc..."
catalog = load_catalog(
"pangolin",
**{
"uri": "http://localhost:8080",
"prefix": "analytics", # Catalog name
"token": token, # Bearer token authentication
# S3 configuration is Optional here!
# If the warehouse has Vending enabled, Pangolin will provide credentials.
}
)Tip
Credential Vending: If your Pangolin warehouse is configured with a vending_strategy (like AwsStatic or AwsSts), you do not need to provide s3.access-key-id or s3.secret-access-key in your client configuration. Pangolin will automatically vend temporary credentials to the client.
import jwt
import datetime
from pyiceberg.catalog import load_catalog
def generate_token(tenant_id: str, secret: str = "secret") -> str:
"""Generate JWT token with tenant_id"""
payload = {
"sub": "api-user",
"tenant_id": tenant_id,
"roles": ["User"],
"exp": datetime.datetime.utcnow() + datetime.timedelta(hours=24)
}
return jwt.encode(payload, secret, algorithm="HS256")
# Generate token
token = generate_token("00000000-0000-0000-0000-000000000001")
# Use with PyIceberg
catalog = load_catalog(
"pangolin",
**{
"uri": "http://localhost:8080",
"prefix": "analytics",
"token": token,
}
)catalog:
pangolin:
uri: http://localhost:8080
prefix: analytics
s3.endpoint: http://localhost:9000
s3.access-key-id: minioadmin
s3.secret-access-key: minioadmin
s3.region: us-east-1catalog:
pangolin:
uri: http://localhost:8080
prefix: analytics
token: eyJ0eXAiOiJKV1QiLCJhbGc... # From /api/v1/tokens
s3.endpoint: http://localhost:9000
s3.access-key-id: minioadmin
s3.secret-access-key: minioadmin
s3.region: us-east-1Then in Python:
from pyiceberg.catalog import load_catalog
catalog = load_catalog("pangolin")pip install pysparkfrom pyspark.sql import SparkSession
spark = SparkSession.builder \\
.appName("Pangolin Example") \\
.config("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.0,org.apache.iceberg:iceberg-aws-bundle:1.5.0") \\
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \\
.config("spark.sql.catalog.pangolin", "org.apache.iceberg.spark.SparkCatalog") \\
.config("spark.sql.catalog.pangolin.catalog-impl", "org.apache.iceberg.rest.RESTCatalog") \\
.config("spark.sql.catalog.pangolin.uri", "http://localhost:8080/v1/analytics") \\
.config("spark.sql.catalog.pangolin.io-impl", "org.apache.iceberg.aws.s3.S3FileIO") \\
.config("spark.sql.catalog.pangolin.s3.endpoint", "http://localhost:9000") \\
.config("spark.sql.catalog.pangolin.s3.access-key-id", "minioadmin") \\
.config("spark.sql.catalog.pangolin.s3.secret-access-key", "minioadmin") \\
.config("spark.sql.catalog.pangolin.s3.path-style-access", "true") \\
.config("spark.sql.catalog.pangolin.header.Authorization", f"Bearer {token}") \\ # Bearer token
.getOrCreate()connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest.uri=http://localhost:8080/v1/analytics
iceberg.rest.http-headers=Authorization:Bearer eyJ0eXAiOiJKV1QiLCJhbGc...
fs.s3a.endpoint=http://localhost:9000
fs.s3a.access-key=minioadmin
fs.s3a.secret-key=minioadmin
fs.s3a.path.style.access=trueYou can create an Iceberg catalog in Flink SQL using the rest catalog type.
CREATE CATALOG pangolin WITH (
'type'='iceberg',
'catalog-type'='rest',
'uri'='http://localhost:8080/v1/analytics',
'token'='<YOUR_JWT_TOKEN>',
'warehouse'='s3://warehouse-bucket/analytics'
);
USE CATALOG pangolin;Note
Ensure you have the flink-table-api-java-bridge and iceberg-flink-runtime jars available in your Flink environment.
Dremio supports connecting to Iceberg REST catalogs as a source.
- Add Source: Click "Add Data Source" and select "Iceberg REST Catalog".
- General Settings:
- Name:
pangolin_analytics - Endpoint URL:
http://pangolin:8080/v1/analytics/iceberg(Ensure reachable from Dremio)
- Name:
- Authentication:
- If using No-Auth Mode, no further auth is needed.
- If using Token Auth, you may need to pass the token via the
header.Authorizationproperty in the "Advanced Options" -> "Connection Properties" if Dremio's UI doesn't explicitly ask for a Bearer token yet. - Property:
header.Authorization - Value:
Bearer <YOUR_JWT_TOKEN>
- Credential Vending:
- Enable "Use vended credentials" (supported by Pangolin) to let Dremio use the credentials provided by the catalog endpoint for S3/Azure/GCS access.
For production deployments, use environment variables:
import os
from pyiceberg.catalog import load_catalog
catalog = load_catalog(
"pangolin",
**{
"uri": os.getenv("PANGOLIN_URI", "http://localhost:8080"),
"prefix": os.getenv("PANGOLIN_CATALOG", "analytics"),
"token": os.getenv("PANGOLIN_TOKEN"), # JWT token
"s3.endpoint": os.getenv("S3_ENDPOINT"),
"s3.access-key-id": os.getenv("AWS_ACCESS_KEY_ID"),
"s3.secret-access-key": os.getenv("AWS_SECRET_ACCESS_KEY"),
"s3.region": os.getenv("AWS_REGION", "us-east-1"),
}
)-
401 Unauthorized:
- In NO_AUTH mode: Ensure server has
PANGOLIN_NO_AUTH=1set - In production: Check that Bearer token is valid and not expired
- In NO_AUTH mode: Ensure server has
-
404 Not Found: Ensure the catalog name in the prefix/URI matches an existing catalog
-
S3 Access Denied: Verify S3 credentials and endpoint configuration
Enable debug logging:
PyIceberg:
import logging
logging.basicConfig(level=logging.DEBUG)Check Token:
import jwt
decoded = jwt.decode(token, options={"verify_signature": False})
print(f"Tenant ID: {decoded.get('tenant_id')}")
print(f"Expires: {decoded.get('exp')}")If you were using header.X-Pangolin-Tenant:
Old (Not compatible with PyIceberg):
"header.X-Pangolin-Tenant": "00000000-0000-0000-0000-000000000001"New (Iceberg REST spec compliant):
"token": generate_token("00000000-0000-0000-0000-000000000001")The custom header approach still works for direct API calls but is not supported by PyIceberg due to its authentication architecture.