diff --git a/docs.json b/docs.json index 57a26411..761afa04 100644 --- a/docs.json +++ b/docs.json @@ -205,7 +205,14 @@ "references/dimensions", "references/tables", "references/joins", - "references/pre-aggregates", + { + "group": "Pre-aggregates", + "pages": [ + "references/pre-aggregates/overview", + "references/pre-aggregates/getting-started", + "references/pre-aggregates/monitoring" + ] + }, "references/lightdash-cli", "references/lightdash-config-yml", "references/sql-variables" @@ -412,6 +419,10 @@ } }, "redirects": [ + { + "source": "/references/pre-aggregates", + "destination": "/references/pre-aggregates/overview" + }, { "source": "/guides/ai-analyst", "destination": "/guides/ai-agents" diff --git a/images/references/pre-aggregates/pre-agg-analytics.png b/images/references/pre-aggregates/pre-agg-analytics.png new file mode 100644 index 00000000..c04dc7d1 Binary files /dev/null and b/images/references/pre-aggregates/pre-agg-analytics.png differ diff --git a/images/references/pre-aggregates/pre-agg-dashboard-audit.png b/images/references/pre-aggregates/pre-agg-dashboard-audit.png new file mode 100644 index 00000000..9f130ed1 Binary files /dev/null and b/images/references/pre-aggregates/pre-agg-dashboard-audit.png differ diff --git a/images/references/pre-aggregates/pre-agg-dashboard-menu.png b/images/references/pre-aggregates/pre-agg-dashboard-menu.png new file mode 100644 index 00000000..8314ff5e Binary files /dev/null and b/images/references/pre-aggregates/pre-agg-dashboard-menu.png differ diff --git a/images/references/pre-aggregates/pre-agg-materialization-detail.png b/images/references/pre-aggregates/pre-agg-materialization-detail.png new file mode 100644 index 00000000..f5f21bbe Binary files /dev/null and b/images/references/pre-aggregates/pre-agg-materialization-detail.png differ diff --git a/images/references/pre-aggregates/pre-agg-materialization-list.png b/images/references/pre-aggregates/pre-agg-materialization-list.png new file mode 100644 index 00000000..3439c36c Binary files /dev/null and b/images/references/pre-aggregates/pre-agg-materialization-list.png differ diff --git a/references/pre-aggregates.mdx b/references/pre-aggregates.mdx deleted file mode 100644 index c97ef653..00000000 --- a/references/pre-aggregates.mdx +++ /dev/null @@ -1,159 +0,0 @@ ---- -title: "Pre-aggregates reference" -description: "Pre-aggregates materialize aggregated data inside Lightdash so queries can be served from pre-computed results instead of hitting your warehouse." -sidebarTitle: "Pre-aggregates" ---- - - - **Availability:** Pre-aggregates are an [Early Access](/references/workspace/feature-maturity-levels) feature available on **Enterprise plans** only. - - -## What are pre-aggregates? - -Pre-aggregates let you define materialized summaries of your data directly in your dbt YAML. When a user runs a query in Lightdash, the system checks if the query can be answered from a pre-aggregate instead of querying your warehouse. If it matches, the query is served from the pre-computed results — making it significantly faster and reducing warehouse load. - -This is especially useful for dashboards with high traffic or expensive aggregations that don't need real-time data. - -### How it works - -1. You define a pre-aggregate on a model, specifying which dimensions and metrics to include -2. Lightdash materializes the aggregated data on a schedule -3. When a user runs a query, Lightdash checks if all requested dimensions, metrics, and filters are covered by a pre-aggregate -4. If a match is found, the query is served from the materialized data instead of your warehouse - -## Defining pre-aggregates - -Pre-aggregates are defined in your dbt model's YAML file under the `pre_aggregates` key in the model's `meta` (or `config.meta` for dbt v1.10+). - - - - ```yaml - models: - - name: orders - meta: - pre_aggregates: - - name: orders_daily_by_status - dimensions: - - status - metrics: - - total_order_amount - - average_order_size - time_dimension: order_date - granularity: day - ``` - - - ```yaml - models: - - name: orders - config: - meta: - pre_aggregates: - - name: orders_daily_by_status - dimensions: - - status - metrics: - - total_order_amount - - average_order_size - time_dimension: order_date - granularity: day - ``` - - - -### Configuration reference - -| Property | Required | Description | -|---|---|---| -| `name` | Yes | Unique identifier for the pre-aggregate. Must contain only letters, numbers, and underscores. | -| `dimensions` | Yes | List of dimension names to include. Must contain at least one dimension. | -| `metrics` | Yes | List of metric names to include. Must contain at least one metric. | -| `time_dimension` | No | A time-based dimension to use for date grouping. Must be paired with `granularity`. | -| `granularity` | No | Time granularity for the `time_dimension`. Must be paired with `time_dimension`. Valid values: `hour`, `day`, `week`, `month`, `year`. | - - - If you specify `time_dimension`, you **must** also specify `granularity`, and vice versa. - - -## Query matching - -When a user runs a query, Lightdash automatically checks if a pre-aggregate can serve the results. A pre-aggregate matches when **all** of the following are true: - -- Every dimension in the query is included in the pre-aggregate -- Every metric in the query is included in the pre-aggregate -- Every dimension used in filters is included in the pre-aggregate -- All metrics use [supported metric types](#supported-metric-types) -- The query does not contain custom dimensions, custom metrics, or table calculations -- If the query uses a time dimension, the requested granularity is **equal to or coarser** than the pre-aggregate's granularity (e.g., a `day` pre-aggregate can serve `day`, `week`, `month`, or `year` queries, but not `hour`) - -When multiple pre-aggregates match a query, Lightdash picks the smallest one (fewest dimensions, then fewest metrics as tiebreaker). - -### Dimensions from joined tables - -Pre-aggregates support dimensions from joined tables. Reference them by their full name (e.g., `customers.first_name`) in the `dimensions` list. - -## Supported metric types - -Pre-aggregates support metrics that can be re-aggregated from pre-computed results. The following metric types are supported: - -- `sum` -- `count` -- `min` -- `max` -- `average` - -Queries that include metrics with other types (e.g., `count_distinct`, `median`, `number`) will not match a pre-aggregate and will query the warehouse directly. - -## Example - -Here's a complete example showing a model with a pre-aggregate: - -```yaml -models: - - name: orders - config: - meta: - joins: - - join: customers - sql_on: ${customers.customer_id} = ${orders.customer_id} - pre_aggregates: - - name: orders_daily_by_status - dimensions: - - status - metrics: - - total_order_amount - - average_order_size - time_dimension: order_date - granularity: day - columns: - - name: order_date - config: - meta: - dimension: - type: date - - name: status - config: - meta: - dimension: - type: string - - name: amount - config: - meta: - metrics: - total_order_amount: - type: sum - average_order_size: - type: average -``` - -With this pre-aggregate, the following queries would be served from materialized data: - -- Total order amount by status, grouped by day/week/month/year -- Average order size by status, grouped by month -- Total order amount filtered by status - -These queries would **not** match and would query the warehouse directly: - -- Queries including `count_distinct` metrics -- Queries grouped by a dimension not in the pre-aggregate (e.g., `customer_id`) -- Queries with hourly granularity (finer than the pre-aggregate's `day`) diff --git a/references/pre-aggregates/getting-started.mdx b/references/pre-aggregates/getting-started.mdx new file mode 100644 index 00000000..239e286e --- /dev/null +++ b/references/pre-aggregates/getting-started.mdx @@ -0,0 +1,187 @@ +--- +title: "Getting started with pre-aggregates" +sidebarTitle: "Getting started" +description: "Define pre-aggregates in your dbt YAML, configure scheduling, and start serving queries from materialized data." +--- + +## Defining pre-aggregates + +Pre-aggregates are defined in your dbt model's YAML file under the `pre_aggregates` key in the model's `meta` (or `config.meta` for dbt v1.10+). + + + + ```yaml + models: + - name: orders + meta: + pre_aggregates: + - name: orders_daily_by_status + dimensions: + - status + metrics: + - total_order_amount + - average_order_size + time_dimension: order_date + granularity: day + ``` + + + ```yaml + models: + - name: orders + config: + meta: + pre_aggregates: + - name: orders_daily_by_status + dimensions: + - status + metrics: + - total_order_amount + - average_order_size + time_dimension: order_date + granularity: day + ``` + + + +## Configuration reference + +| Property | Required | Description | +|---|---|---| +| `name` | Yes | Unique identifier for the pre-aggregate. Must contain only letters, numbers, and underscores. | +| `dimensions` | Yes | List of dimension names to include. Must contain at least one dimension. | +| `metrics` | Yes | List of metric names to include. Must contain at least one metric. | +| `time_dimension` | No | A time-based dimension for date grouping. Must be paired with `granularity`. | +| `granularity` | No | Time granularity for the `time_dimension`. Valid values: `hour`, `day`, `week`, `month`, `quarter`, `year`. Must be paired with `time_dimension`. | +| `max_rows` | No | Maximum number of rows to store in the materialization. If the aggregation exceeds this limit, the result is truncated. Must be a positive integer. | +| `refresh` | No | Schedule configuration for automatic re-materialization. See [Scheduling refreshes](#scheduling-refreshes). | + + + If you specify `time_dimension`, you **must** also specify `granularity`, and vice versa. + + +## Multiple pre-aggregates per model + +You can define multiple pre-aggregates on the same model, each targeting different query patterns. For example, you might want a fine-grained daily pre-aggregate for detailed dashboards and a coarser monthly one for summary views: + +```yaml +models: + - name: orders + config: + meta: + pre_aggregates: + - name: orders_daily_by_status + dimensions: + - status + metrics: + - total_order_amount + - order_count + time_dimension: order_date + granularity: day + - name: orders_monthly_summary + dimensions: + - status + metrics: + - total_order_amount + time_dimension: order_date + granularity: month + max_rows: 1000000 +``` + +When a query matches multiple pre-aggregates, Lightdash picks the smallest one. + +## Scheduling refreshes + +By default, pre-aggregates are materialized when your dbt project compiles. You can also schedule automatic refreshes using cron expressions, using your project's configured timezone (defaults to UTC): + +```yaml +pre_aggregates: + - name: orders_daily_by_status + dimensions: + - status + metrics: + - total_order_amount + time_dimension: order_date + granularity: day + refresh: + cron: "0 6 * * *" # Every day at 6:00 AM UTC +``` + +### Materialization triggers + +Pre-aggregates can be materialized through four different triggers: + +| Trigger | When it happens | +|---|---| +| **Compile** | Automatically when your dbt project is compiled | +| **Cron** | On the schedule you define in `refresh.cron` | +| **Manual** | When you trigger a refresh from the Lightdash UI | + +## Row limits + +You can set `max_rows` to cap the size of a materialization. If the aggregation produces more rows than the limit, the result is truncated. + + + When `max_rows` is applied, some data is excluded from the materialization. Queries that match the pre-aggregate may return incomplete results. Use this setting carefully and monitor for the "max rows applied" warning in the [monitoring UI](/references/pre-aggregates/monitoring). + + + +## Complete example + +Here's a full model definition with a pre-aggregate, including joins, scheduling, and row limits: + +```yaml +models: + - name: orders + config: + meta: + joins: + - join: customers + sql_on: ${customers.customer_id} = ${orders.customer_id} + pre_aggregates: + - name: orders_daily_by_status + dimensions: + - status + - customers.country + metrics: + - total_order_amount + - average_order_size + time_dimension: order_date + granularity: day + max_rows: 5000000 + refresh: + cron: "0 6 * * *" + columns: + - name: order_date + config: + meta: + dimension: + type: date + - name: status + config: + meta: + dimension: + type: string + - name: amount + config: + meta: + metrics: + total_order_amount: + type: sum + average_order_size: + type: average +``` + +With this pre-aggregate, the following queries would be served from materialized data: + +- Total order amount by status, grouped by day, week, month, or year +- Average order size by status, grouped by month +- Total order amount filtered by status or customer country +- Order amount by customer country, grouped by quarter + +These queries would **not** match and would query the warehouse directly: + +- Queries including `count_distinct` metrics +- Queries grouped by a dimension not in the pre-aggregate (for example, `customer_id`) +- Queries with hourly granularity (finer than the pre-aggregate's `day`) +- Queries with custom dimensions, custom metrics, or table calculations diff --git a/references/pre-aggregates/monitoring.mdx b/references/pre-aggregates/monitoring.mdx new file mode 100644 index 00000000..ffbd3d1a --- /dev/null +++ b/references/pre-aggregates/monitoring.mdx @@ -0,0 +1,68 @@ +--- +title: "Monitoring and debugging pre-aggregates" +sidebarTitle: "Monitoring and debugging" +description: "Track materialization status, understand why queries miss pre-aggregates, and manage refreshes." +--- + +You can monitor pre-aggregates from **Project Settings > Pre-aggregates**. This section has two tabs: **Materializations** for tracking status and **Analytics** for query hit/miss statistics. + +## Materialization status + +The **Materializations** tab shows all your pre-aggregates and their current state. + +Pre-aggregate materializations list + +Each pre-aggregate has a materialization lifecycle: + +| Status | Meaning | +|---|---| +| **Active** | The materialization is live and serving matching queries. Only one materialization can be active per pre-aggregate at a time. | +| **In progress** | A new materialization is being built. The previous active materialization continues serving queries until the new one completes. | +| **Failed** | The materialization encountered an error. Check the error message for details. | + +You can click on any pre-aggregate to see its full details, including row count, file size, and duration. + +Pre-aggregate materialization detail + +## Hit and miss statistics + +The **Analytics** tab tracks how often queries hit or miss pre-aggregates on a daily basis. + +Pre-aggregate analytics + +You can break down the statistics by: + +- **Explore name** — which explores are benefiting from pre-aggregates +- **Query context** — whether the query came from a chart, dashboard, or the explorer +- **Chart or dashboard** — which specific saved content is hitting or missing + +Use these stats to identify opportunities for new pre-aggregates or to tune existing ones. + +## Why a query misses a pre-aggregate + +When a query doesn't match any pre-aggregate, Lightdash records the specific reason. Understanding these reasons helps you decide whether to adjust your pre-aggregate definition or accept the warehouse query. + +| Miss reason | What it means | How to fix it | +|---|---|---| +| **No pre-aggregates defined** | The explore has no pre-aggregates configured. | Add a `pre_aggregates` block to your dbt model. | +| **Dimension not in pre-aggregate** | The query includes a dimension not covered by any pre-aggregate. | Add the missing dimension to your pre-aggregate's `dimensions` list. | +| **Metric not in pre-aggregate** | The query includes a metric not covered by any pre-aggregate. | Add the missing metric to your pre-aggregate's `metrics` list. | +| **Filter dimension not in pre-aggregate** | A filter references a dimension not in the pre-aggregate. | Add the filter dimension to the `dimensions` list — even if it's only used for filtering, not grouping. | +| **Non-additive metric** | The query includes a metric type that can't be re-aggregated (for example, `count_distinct` or `median`). | This metric type is not supported. See [supported metric types](/references/pre-aggregates/overview#supported-metric-types). | +| **Custom SQL metric** | A metric uses a custom SQL expression. | Custom SQL metrics are not supported by pre-aggregates. | +| **Granularity too fine** | The query requests a finer time granularity than the pre-aggregate provides (for example, `hour` on a `day` pre-aggregate). | Either lower the pre-aggregate's granularity or accept the warehouse query for this use case. | +| **Custom dimension present** | The query uses a custom dimension. | Custom dimensions are not supported by pre-aggregates. | +| **Custom metric present** | The query uses a custom metric. | Custom metrics are not supported by pre-aggregates. | +| **Table calculation present** | The query includes table calculations. | Table calculations are not supported by pre-aggregates. | +| **User bypass** | The user explicitly bypassed the pre-aggregate cache. | No action needed — this is intentional. | + +## Dashboard pre-aggregate view + +You can also monitor and manage pre-aggregates directly from any dashboard. Open the dashboard menu to access pre-aggregate options. + +Pre-aggregate options in the dashboard menu + +1. **Pre-aggregation audit** — Shows which tiles in the dashboard are hitting or missing pre-aggregates, and why. Available to editors, developers, and admins. +2. **Rebuild pre-aggregates** — Triggers a manual re-materialization for the pre-aggregates used by this dashboard. Available to developers and admins only. + +Pre-aggregate audit view for a dashboard \ No newline at end of file diff --git a/references/pre-aggregates/overview.mdx b/references/pre-aggregates/overview.mdx new file mode 100644 index 00000000..edc3b027 --- /dev/null +++ b/references/pre-aggregates/overview.mdx @@ -0,0 +1,71 @@ +--- +title: "Pre-aggregates" +sidebarTitle: "Overview" +description: "Speed up dashboards and reduce warehouse costs by serving queries from pre-computed, materialized summaries." +--- + + + **Availability:** Pre-aggregates are an [Early Access](/references/workspace/feature-maturity-levels) feature available on **Enterprise plans** only. + + +Pre-aggregates let you define materialized summaries of your data directly in your dbt YAML. When a user runs a query in Lightdash, the system checks if the query can be answered from a pre-aggregate instead of querying your warehouse. If it matches, the query is served from the pre-computed results — making it significantly faster and reducing warehouse load. + +This is especially useful for dashboards with high traffic or expensive aggregations that don't need real-time data. + + + + Define pre-aggregates in your dbt project and configure scheduling. + + + Track materialization status, debug query matching, and view hit/miss stats. + + + +## How it works + +Pre-aggregates follow a four-step cycle: + +1. **Define** — You add a `pre_aggregates` block to your dbt model YAML, specifying which dimensions and metrics to include. +2. **Materialize** — Lightdash runs the aggregation query against your warehouse and stores the results. This happens automatically on compile, on a cron schedule you define, or when you trigger it manually. +3. **Match** — When a user runs a query, Lightdash checks if every requested dimension, metric, and filter is covered by a pre-aggregate. +4. **Serve** — If a match is found, the query is served from the materialized data instead of hitting your warehouse. + +{/* TODO: Add architecture diagram here showing the define → materialize → match → serve cycle */} + +## Query matching + +When a user runs a query, Lightdash automatically checks if a pre-aggregate can serve the results. A pre-aggregate matches when **all** of the following are true: + +- Every dimension in the query is included in the pre-aggregate +- Every metric in the query is included in the pre-aggregate +- Every dimension used in **filters** is included in the pre-aggregate +- All metrics use [supported metric types](#supported-metric-types) +- The query does not contain custom dimensions, custom metrics, or table calculations +- If the query uses a time dimension, the requested granularity is **equal to or coarser** than the pre-aggregate's granularity (for example, a `day` pre-aggregate can serve `day`, `week`, `month`, or `year` queries — but not `hour`) + +When multiple pre-aggregates match a query, Lightdash picks the smallest one (fewest dimensions, then fewest metrics as tiebreaker). + +### Dimensions from joined tables + +Pre-aggregates support dimensions from joined tables. Reference them by their full name (for example, `customers.first_name`) in the `dimensions` list. + +## Supported metric types + +Pre-aggregates support metrics that can be re-aggregated from pre-computed results: + +- `sum` +- `count` +- `min` +- `max` +- `average` + +### Unsupported metric types + +Queries that include any of the following metric types will **not** match a pre-aggregate and will query the warehouse directly: + +- `count_distinct`, `sum_distinct`, `average_distinct` +- `median`, `percentile` +- `percent_of_total`, `percent_of_previous` +- `running_total` +- `number`, `string`, `date`, `timestamp`, `boolean` +- Metrics with custom SQL expressions