diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc index 422c6f54..a5a4fec5 100644 --- a/modules/ROOT/nav.adoc +++ b/modules/ROOT/nav.adoc @@ -23,6 +23,8 @@ *** xref:get-started:cluster-types/byoc/remote-read-replicas.adoc[] ** xref:get-started:cluster-types/create-dedicated-cloud-cluster.adoc[] +* xref:manage:sql-engine/workshop.adoc[Redpanda SQL Engine] + * xref:ai-agents:index.adoc[Agentic AI] ** xref:ai-agents:mcp/index.adoc[MCP] *** xref:ai-agents:mcp/overview.adoc[MCP Overview] diff --git a/modules/manage/pages/sql-engine/workshop.adoc b/modules/manage/pages/sql-engine/workshop.adoc new file mode 100644 index 00000000..2be1b365 --- /dev/null +++ b/modules/manage/pages/sql-engine/workshop.adoc @@ -0,0 +1,514 @@ += Redpanda SQL Engine Quickstart +:description: Learn how to provision, configure, and use Redpanda SQL Engine (Oxla) with BYOC clusters to query streaming data using SQL. +:page-categories: Cloud, SQL, BYOC + +This quickstart guides you through provisioning and using Redpanda SQL Engine (formerly Oxla) in a Bring Your Own Cloud (BYOC) environment. You'll learn to query Redpanda topics using standard SQL, manage users and permissions, and scale your SQL Engine deployment. + +== What is Redpanda SQL Engine? + +Redpanda SQL Engine is a SQL query engine that runs alongside your Redpanda cluster, allowing you to query streaming data using familiar SQL syntax instead of building custom consumers or streaming applications. It transforms your Redpanda cluster into a SQL-addressable analytics surface, making it easier to explore, analyze, and join streaming data with minimal integration overhead. + +From an architecture perspective, SQL Engine runs as its own cluster with a configurable number of replicas. It can be fronted by either an internal or external load balancer, allowing you to restrict access to in-cluster components (like Redpanda Connect) or expose it more broadly to applications within your VPC or to public clients. + +SQL Engine integrates directly with Redpanda Schema Registry to automatically derive table schemas from your topics. This means you can immediately query topics containing Protobuf data without manually defining schemas. The engine supports standard SQL operations including joins, aggregations, window functions, and complex analytical queries. + +In Redpanda Cloud BYOC deployments, SQL Engine is deployed as an additional service inside your Redpanda data plane. It's controlled using the `redpanda_oxla` section of your cluster specification and can be enabled per cluster using Fleet Management API operations. + +== Limitations + +SQL Engine has the following limitations in its current state: + +[cols="1,3"] +|=== +| Limitation | Description + +| Read-only access +| You cannot write to Redpanda-backed tables using `INSERT`, `UPDATE`, or `DELETE` statements. + +| Protobuf only +| Tables can only be created from Redpanda topics that use Protobuf serialization with schemas stored in Schema Registry. + +| Schema Registry required +| Table schemas must be derived from Redpanda Schema Registry; manual schema definition is not supported. + +| Backward-compatible schemas only +| Schema evolution must be backward-compatible; breaking schema changes may cause query failures. + +| Flat structures only +| The first release supports only the `FLATTEN` struct mapping policy; compound types, variants, and JSON mapping are not yet supported. + +| Single-dimensional arrays +| Arrays of arrays and nested complex types are not supported. + +| No streaming queries +| Queries execute in on-demand mode and return results when complete; continuous streaming queries are not supported. + +| Limited type support +a| The following Redpanda data types are not supported: + +* UUID +* Fixed-length binary types +* Map types +* Variant types (coming in future release) +* Geometry and geography types (limited support) + +| No consumer groups +| Redpanda SQL Engine does not use Kafka consumer groups; partition assignment is handled by the query planner +|=== + +== Prerequisites + +Before creating a BYOC cluster with Redpanda SQL Engine you must: + +* Have access to Redpanda Cloud and a resource group that will own the BYOC cluster +* Enable (`enable-redpanda-oxla`) the feature flag for your resource group in LaunchDarkly +* Decide on the level of network exposure mode: +** Cluster-internal only (no load balancer) +** Internal load balancer (VPC peered access) +** External load balancer (publicly accessible endpoint) + +== Provision Redpanda SQL Engine using BYOC + +Provisioning a BYOC cluster with Redpanda SQL Engine follows the standard BYOC workflow. The only differences are that you must enable the SQL Engine feature flag for your resource group and, optionally, configuring network exposure. + +=== Create the BYOC cluster + +Follow the standard BYOC cluster creation process: + +. In the Redpanda Cloud UI, navigate to your Oxla-enabled resource group. +. Click *Create Cluster* and select *BYOC*. +. Choose your cloud provider (AWS, GCP, or Azure) and region. +. Configure networking details (VPC/VNet, CIDR ranges). +. Download the Terraform bundle generated by the wizard. +. Extract the bundle locally and run the bootstrap script: ++ +[,bash] +---- +cd +./bootstrap.sh +---- + +. Wait for the cluster to show the Ready state in the Redpanda Cloud UI. + +When the cluster provisions, Redpanda SQL Engine is automatically deployed into the same data plane because the resource group has the `enable-redpanda-oxla` feature flag enabled. + +=== Configure Redpanda SQL Engine network exposure + +By default, Redpanda SQL Engine deploys with cluster-internal access only. To change the network exposure mode, use the Fleet Management API. + +**To enable with internal load balancer (VPC access): + +[,bash] +---- +fleetmgmt operation create --type update \ + --update-paths 'spec.redpanda_oxla' \ + --redpanda-proto '{"spec": {"redpanda_oxla": {"enabled": true, "replicas": 1, "load_balancer": {"enabled": true, "type": 1}}}}' \ + --cluster-ids +---- + +To enable with external load balancer (public access): + +[,bash] +---- +fleetmgmt operation create --type update \ + --update-paths 'spec.redpanda_oxla' \ + --redpanda-proto '{"spec": {"redpanda_oxla": {"enabled": true, "replicas": 1, "load_balancer": {"enabled": true, "type": 2}}}}' \ + --cluster-ids +---- + +After enabling an external load balancer, manually run the appropriate Terraform apply for the infrastructure module on the agent VM. + +== Manage Redpanda SQL Engine + +Redpanda SQL Engine uses PostgreSQL-compatible authentication and authorization. You manage users, roles, and privileges using standard SQL commands. + +=== Retrieve the superuser password + +Redpanda SQL Engine creates a superuser account during provisioning. Retrieve the password from the Kubernetes secret: + +[,bash] +---- +kubectl get secret -n redpanda-oxla oxla-superuser -o jsonpath='{.data.password}' | base64 -d +---- + +=== Create users + +Connect to Redpanda SQL Engine and create users with the `CREATE ROLE` statement: + +[,sql] +---- +CREATE ROLE data_analyst WITH LOGIN PASSWORD 'secure_password'; +---- + +=== Grant privileges + +Grant specific privileges to users or roles: + +[,sql] +---- +-- Grant table read access +GRANT SELECT ON TABLE customer_events TO data_analyst; + +-- Grant schema-level access +GRANT USAGE ON SCHEMA public TO data_analyst; + +-- Grant all privileges on a table +GRANT ALL PRIVILEGES ON TABLE order_events TO data_analyst; +---- + +=== Authorization and authentication + +Redpanda SQL Engine implements PostgreSQL-compatible role-based access control (RBAC): + +**Authentication:** Users authenticate using username and password over PostgreSQL wire protocol. Redpanda SQL Engine supports standard PostgreSQL authentication methods including SCRAM-SHA-256. + +For authorization, privileges control what users can do: + +[cols="1,3"] +|=== +| Privilege Name | Description + +| `SELECT` +| Read data from tables + +| `INSERT` +| Write data to tables (not supported for Redpanda-backed tables) + +| `UPDATE` +| Modify existing data (not supported for Redpanda-backed tables) + +| `DELETE` +| Remove data (not supported for Redpanda-backed tables) + +| `USAGE` +| Access schemas and other objects + +| `CREATE` +| Create new objects (tables, connections, sources) +|=== + +Use `GRANT` and `REVOKE` statements to manage access control. + +== Create tables + +Redpanda SQL Engine allows you to create tables backed by Redpanda topics. Tables are read-only and automatically derive their schema from Redpanda Schema Registry. + +=== Create a connection to Redpanda + +First, define a connection to your Redpanda cluster: + +[,sql] +---- +CREATE KAFKA CONNECTION redpanda_prod +OPTIONS ( + 'initial_brokers' 'broker-1:9092,broker-2:9092,broker-3:9092', + 'schema_registry_url' 'http://schema-registry:8081', + 'sasl_mechanism' 'SCRAM-SHA-256', + 'sasl_user' 'sql-engine-user', + 'sasl_password' 'your-password' +); +---- + +For mTLS authentication: + +[,sql] +---- +CREATE KAFKA CONNECTION redpanda_mtls +OPTIONS ( + 'initial_brokers' 'broker-1:9092,broker-2:9092,broker-3:9092', + 'schema_registry_url' 'https://schema-registry:8081', + 'truststore' '', + 'key_store_key' '', + 'key_store_cert' '' +); +---- + +=== Create a table from a Redpanda topic + +Create a queryable table from a Redpanda topic: + +[,sql] +---- +CREATE KAFKA SOURCE customer_events +TOPIC customer_events_topic +CONNECTION redpanda_prod; +---- + +With optional parameters: + +[,sql] +---- +CREATE KAFKA SOURCE order_events +TOPIC orders +SCHEMA_SUBJECT orders-value +SCHEMA_LOOKUP_POLICY LATEST +ERROR_HANDLING_POLICY FILL_NULL +CONNECTION redpanda_prod; +---- + +[cols="1,3"] +|=== +| Parameter Name | Description + +| `TOPIC` +| Name of the Redpanda topic + +| `SCHEMA_SUBJECT` +| Schema Registry subject name (optional, defaults to TopicNameStrategy) + +| `SCHEMA_LOOKUP_POLICY` +a| How to look up record schemas: + +* `LATEST` - Use the latest schema from Schema Registry (default) +* `SCHEMA_ID` - Use the schema ID prefixed in each record + +| `ERROR_HANDLING_POLICY` +a| How to handle deserialization failures: + +* `FAIL` - Fail the query (default) +* `FILL_NULL` - Fill columns with NULL values +* `DROP_RECORD` - Skip the record + +| `CONNECTION` +| Connection name to use +|=== + +=== Query the table + +Query Redpanda-backed tables using standard SQL: + +[,sql] +---- +-- Simple query +SELECT customer_id, event_type, timestamp +FROM customer_events +WHERE event_type = 'purchase' +LIMIT 100; + +-- Aggregation +SELECT event_type, COUNT(*) as event_count +FROM customer_events +GROUP BY event_type; + +-- Join with another table +SELECT + c.customer_id, + c.event_type, + o.order_total +FROM customer_events c +JOIN order_events o ON c.customer_id = o.customer_id +WHERE c.timestamp > NOW() - INTERVAL '1 day'; +---- + +=== Metadata columns + +All Redpanda-backed tables include metadata columns: + +[cols="2,3"] +|=== +| Metadata Column Name | Description + +| `redpanda.partition` +| Partition ID + +| `redpanda.offset` +| Record offset + +| `redpanda.timestamp` +| Record timestamp + +| `redpanda.timestamp_source` +| Timestamp source (broker or producer) + +| `redpanda.headers` +| Message headers + +| `redpanda.key` +| Raw key (populated on deserialization error with FILL_NULL policy) + +| `redpanda.value` +| Raw value (populated on deserialization error with FILL_NULL policy) +|=== + +Query metadata columns like any other column: + +[,sql] +---- +SELECT + customer_id, + event_type, + redpanda.partition, + redpanda.offset, + redpanda.timestamp +FROM customer_events; +---- + +== Install SQL Engine client tools + +To query SQL Engine, use any PostgreSQL-compatible client. This section shows how to use `psql`, the standard PostgreSQL command-line client. + +=== Install psql + +**On macOS:** + +[,bash] +---- +brew install postgresql +---- + +**On Ubuntu/Debian:** + +[,bash] +---- +sudo apt-get update +sudo apt-get install postgresql-client +---- + +**On RHEL/CentOS:** + +[,bash] +---- +sudo yum install postgresql +---- + +=== Get the SQL Engine endpoint + +Retrieve the SQL Engine connection endpoint based on your load balancer configuration. + +**For internal load balancer:** + +[,bash] +---- +kubectl get svc -n redpanda-oxla oxla-lb -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' +---- + +**For external load balancer:** + +[,bash] +---- +kubectl get svc -n redpanda-oxla oxla-external-lb -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' +---- + +**For cluster-internal (no load balancer):** + +[,bash] +---- +kubectl get svc -n redpanda-oxla oxla -o jsonpath='{.spec.clusterIP}' +---- + +=== Connect with psql + +Connect to SQL Engine using the superuser credentials: + +[,bash] +---- +psql -h -p 5432 -U oxla_admin -d oxla +---- + +Enter the superuser password when prompted. + +=== Run queries + +Once connected, you can run SQL queries: + +[,sql] +---- +-- List all tables +\dt + +-- Describe a table +\d customer_events + +-- Run a query +SELECT * FROM customer_events LIMIT 10; + +-- Exit psql +\q +---- + +== Scale up or down using Data Plane APIs + +You can scale SQL Engine by adjusting the number of replicas using Fleet Management operations. + +=== Scale up + +To increase the number of Redpanda SQL Engine replicas to handle higher query loads: + +[,bash] +---- +fleetmgmt operation create --type update \ + --update-paths 'spec.redpanda_oxla.replicas' \ + --redpanda-proto '{"spec": {"redpanda_oxla": {"replicas": 3}}}' \ + --cluster-ids +---- + +=== Scale down + +To decrease the number of replicas to reduce resource usage: + +[,bash] +---- +fleetmgmt operation create --type update \ + --update-paths 'spec.redpanda_oxla.replicas' \ + --redpanda-proto '{"spec": {"redpanda_oxla": {"replicas": 1}}}' \ + --cluster-ids +---- + +=== Verify scaling operations + +To check the operation status: + +[,bash] +---- +fleetmgmt operation get +---- + +To verify the new replica count: + +[,bash] +---- +kubectl get pods -n redpanda-oxla +---- + +== Delete or destroy the cluster + +To remove Redpanda SQL Engine from your cluster or delete the entire BYOC cluster, use Fleet Management operations. + +=== Disable SQL Engine only + +To remove Redpanda SQL Engine while keeping the Redpanda cluster running: + +[,bash] +---- +fleetmgmt operation create --type update \ + --update-paths 'spec.redpanda_oxla.enabled' \ + --redpanda-proto '{"spec": {"redpanda_oxla": {"enabled": false}}}' \ + --cluster-ids +---- + +This stops all Redpanda SQL Engine pods and removes the service from your data plane. + +=== Delete the entire BYOC cluster + +To delete the BYOC cluster, including Redpanda SQL Engine: + +. In the Redpanda Cloud Console, navigate to the cluster details page. +. Click *Delete Cluster*. +. Confirm the deletion. +. Run the Terraform **destroy** command from your BYOC bundle directory: ++ +[,bash] +---- +cd +terraform destroy +---- + +. Confirm the destruction when prompted. + +This removes all Redpanda and SQL Engine resources from your cloud account. + +== Next steps + +* Explore the https://docs.oxla.com/welcome[SQL Engine documentation^] for detailed SQL syntax and function references +* Review xref:manage:schema-reg/schema-reg-overview.adoc[] for managing Protobuf schemas +* Learn about xref:get-started:cluster-types/byoc/index.adoc[] deployment options