Skip to content

Commit 34d36c8

Browse files
committed
docs: reorganize pre-aggregates documentation into overview, getting started, and monitoring sections
1 parent 89cb50d commit 34d36c8

10 files changed

Lines changed: 338 additions & 160 deletions

docs.json

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,14 @@
205205
"references/dimensions",
206206
"references/tables",
207207
"references/joins",
208-
"references/pre-aggregates",
208+
{
209+
"group": "Pre-aggregates",
210+
"pages": [
211+
"references/pre-aggregates/overview",
212+
"references/pre-aggregates/getting-started",
213+
"references/pre-aggregates/monitoring"
214+
]
215+
},
209216
"references/lightdash-cli",
210217
"references/lightdash-config-yml",
211218
"references/sql-variables"
@@ -412,6 +419,10 @@
412419
}
413420
},
414421
"redirects": [
422+
{
423+
"source": "/references/pre-aggregates",
424+
"destination": "/references/pre-aggregates/overview"
425+
},
415426
{
416427
"source": "/guides/ai-analyst",
417428
"destination": "/guides/ai-agents"
1.27 MB
Loading
579 KB
Loading
926 KB
Loading
1020 KB
Loading
967 KB
Loading

references/pre-aggregates.mdx

Lines changed: 0 additions & 159 deletions
This file was deleted.
Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
---
2+
title: "Getting started with pre-aggregates"
3+
sidebarTitle: "Getting started"
4+
description: "Define pre-aggregates in your dbt YAML, configure scheduling, and start serving queries from materialized data."
5+
---
6+
7+
## Defining pre-aggregates
8+
9+
Pre-aggregates are defined in your dbt model's YAML file under the `pre_aggregates` key in the model's `meta` (or `config.meta` for dbt v1.10+).
10+
11+
<Tabs>
12+
<Tab title="dbt v1.9 and earlier">
13+
```yaml
14+
models:
15+
- name: orders
16+
meta:
17+
pre_aggregates:
18+
- name: orders_daily_by_status
19+
dimensions:
20+
- status
21+
metrics:
22+
- total_order_amount
23+
- average_order_size
24+
time_dimension: order_date
25+
granularity: day
26+
```
27+
</Tab>
28+
<Tab title="dbt v1.10+ and Fusion">
29+
```yaml
30+
models:
31+
- name: orders
32+
config:
33+
meta:
34+
pre_aggregates:
35+
- name: orders_daily_by_status
36+
dimensions:
37+
- status
38+
metrics:
39+
- total_order_amount
40+
- average_order_size
41+
time_dimension: order_date
42+
granularity: day
43+
```
44+
</Tab>
45+
</Tabs>
46+
47+
## Configuration reference
48+
49+
| Property | Required | Description |
50+
|---|---|---|
51+
| `name` | Yes | Unique identifier for the pre-aggregate. Must contain only letters, numbers, and underscores. |
52+
| `dimensions` | Yes | List of dimension names to include. Must contain at least one dimension. |
53+
| `metrics` | Yes | List of metric names to include. Must contain at least one metric. |
54+
| `time_dimension` | No | A time-based dimension for date grouping. Must be paired with `granularity`. |
55+
| `granularity` | No | Time granularity for the `time_dimension`. Valid values: `hour`, `day`, `week`, `month`, `quarter`, `year`. Must be paired with `time_dimension`. |
56+
| `max_rows` | No | Maximum number of rows to store in the materialization. If the aggregation exceeds this limit, the result is truncated. Must be a positive integer. |
57+
| `refresh` | No | Schedule configuration for automatic re-materialization. See [Scheduling refreshes](#scheduling-refreshes). |
58+
59+
<Note>
60+
If you specify `time_dimension`, you **must** also specify `granularity`, and vice versa.
61+
</Note>
62+
63+
## Multiple pre-aggregates per model
64+
65+
You can define multiple pre-aggregates on the same model, each targeting different query patterns. For example, you might want a fine-grained daily pre-aggregate for detailed dashboards and a coarser monthly one for summary views:
66+
67+
```yaml
68+
models:
69+
- name: orders
70+
config:
71+
meta:
72+
pre_aggregates:
73+
- name: orders_daily_by_status
74+
dimensions:
75+
- status
76+
metrics:
77+
- total_order_amount
78+
- order_count
79+
time_dimension: order_date
80+
granularity: day
81+
- name: orders_monthly_summary
82+
dimensions:
83+
- status
84+
metrics:
85+
- total_order_amount
86+
time_dimension: order_date
87+
granularity: month
88+
max_rows: 1000000
89+
```
90+
91+
When a query matches multiple pre-aggregates, Lightdash picks the smallest one.
92+
93+
## Scheduling refreshes
94+
95+
By default, pre-aggregates are materialized when your dbt project compiles. You can also schedule automatic refreshes using cron expressions, using your project's configured timezone (defaults to UTC):
96+
97+
```yaml
98+
pre_aggregates:
99+
- name: orders_daily_by_status
100+
dimensions:
101+
- status
102+
metrics:
103+
- total_order_amount
104+
time_dimension: order_date
105+
granularity: day
106+
refresh:
107+
cron: "0 6 * * *" # Every day at 6:00 AM UTC
108+
```
109+
110+
### Materialization triggers
111+
112+
Pre-aggregates can be materialized through four different triggers:
113+
114+
| Trigger | When it happens |
115+
|---|---|
116+
| **Compile** | Automatically when your dbt project is compiled |
117+
| **Cron** | On the schedule you define in `refresh.cron` |
118+
| **Manual** | When you trigger a refresh from the Lightdash UI |
119+
120+
## Row limits
121+
122+
You can set `max_rows` to cap the size of a materialization. If the aggregation produces more rows than the limit, the result is truncated.
123+
124+
<Warning>
125+
When `max_rows` is applied, some data is excluded from the materialization. Queries that match the pre-aggregate may return incomplete results. Use this setting carefully and monitor for the "max rows applied" warning in the [monitoring UI](/references/pre-aggregates/monitoring).
126+
</Warning>
127+
128+
129+
## Complete example
130+
131+
Here's a full model definition with a pre-aggregate, including joins, scheduling, and row limits:
132+
133+
```yaml
134+
models:
135+
- name: orders
136+
config:
137+
meta:
138+
joins:
139+
- join: customers
140+
sql_on: ${customers.customer_id} = ${orders.customer_id}
141+
pre_aggregates:
142+
- name: orders_daily_by_status
143+
dimensions:
144+
- status
145+
- customers.country
146+
metrics:
147+
- total_order_amount
148+
- average_order_size
149+
time_dimension: order_date
150+
granularity: day
151+
max_rows: 5000000
152+
refresh:
153+
cron: "0 6 * * *"
154+
columns:
155+
- name: order_date
156+
config:
157+
meta:
158+
dimension:
159+
type: date
160+
- name: status
161+
config:
162+
meta:
163+
dimension:
164+
type: string
165+
- name: amount
166+
config:
167+
meta:
168+
metrics:
169+
total_order_amount:
170+
type: sum
171+
average_order_size:
172+
type: average
173+
```
174+
175+
With this pre-aggregate, the following queries would be served from materialized data:
176+
177+
- Total order amount by status, grouped by day, week, month, or year
178+
- Average order size by status, grouped by month
179+
- Total order amount filtered by status or customer country
180+
- Order amount by customer country, grouped by quarter
181+
182+
These queries would **not** match and would query the warehouse directly:
183+
184+
- Queries including `count_distinct` metrics
185+
- Queries grouped by a dimension not in the pre-aggregate (for example, `customer_id`)
186+
- Queries with hourly granularity (finer than the pre-aggregate's `day`)
187+
- Queries with custom dimensions, custom metrics, or table calculations

0 commit comments

Comments
 (0)