Analytics Engineering Take-Home Challenge

Overview

You are joining the data team at Venatus, a programmatic advertising company. Raw data from our ad-serving platform has been pre-loaded into a ClickHouse data warehouse. Your task is to:

Build a clean, tested, and documented analytics layer using dbt.
Create a dashboard in Lightdash to surface key business insights.
Investigate the data and document what you find.
This challenge has been intentionally designed to detect and cause problems for AI. Do not rely on AI to write your markdown summaries.

This exercise is designed to assess the core skills of an analytics engineer: data modeling, transformation, testing, documentation, and communicating with data.

The Stack

Tool	Purpose	Access
ClickHouse	Columnar data warehouse (raw data pre-loaded)	`http://localhost:8123/play`
dbt (dbt-clickhouse)	Data transformation framework	Runs via Docker
Lightdash	Open-source BI tool (dbt-native)	`http://localhost:8880`

Everything runs in Docker. One command to start.

Getting Started

Prerequisites

Docker and Docker Compose (v2+)
make (standard on macOS / Linux)
Git

1. Clone and start

git clone git@github.com:Venatus/analytics-coding-challenge.git
cd analytics-coding-challenge
make up

make up now also runs an automatic dbt bootstrap (deps, run, compile) after services come up. Wait ~60 seconds for all services to initialise and bootstrap. You'll see the URLs printed in the terminal.

2. Verify the raw data

Open the ClickHouse play UI at http://localhost:8123/play and run:

SELECT count(*) FROM raw.ad_events;
-- Should return ~131,000 rows

SELECT * FROM raw.publishers LIMIT 5;
SELECT * FROM raw.campaigns  LIMIT 5;
SELECT * FROM raw.ad_units   LIMIT 5;

You could also connect DBeaver to Clickhouse to use a UI database manager.

3. Start building

If you change models after startup, rerun dbt commands manually:

make dbt-deps     # Install dbt packages (dbt-utils, etc.)
make dbt-run      # Run your models (once you've written them)
make dbt-test     # Run your tests
make dbt-compile  # Compile project (generates manifest.json for Lightdash)

4. Connect Lightdash

Lightdash is pre-configured with an account and a project connected to ClickHouse. Just log in:

URL: http://localhost:8880
Email: admin@lightdash.com
Password: admin123!

After writing your dbt models, run make dbt-compile then sync the project in Lightdash (Settings → Project → Syncing) so it can discover your models.

At startup, a single neutral starter dashboard/chart is auto-seeded so the Lightdash project is visibly functional. It is intentionally unrelated to the challenge questions. Committed dashboard-as-code files in lightdash/charts/ and lightdash/dashboards/ are also auto-synced at startup, so reviewers can run make up and immediately see candidate dashboards.

5. Candidate dashboard submission workflow (dashboard as code)

Candidates should build dashboards in the Lightdash UI, then export dashboard code and commit it to the repo.

Recommended flow:

Create/modify charts and dashboards in Lightdash UI.
Log in once to Lightdash CLI (runs inside Docker):

make lightdash-login

Export dashboard code:

# Export all dashboards/charts
make lightdash-download-all

Commit exported files under lightdash/ (typically lightdash/dashboards/ and lightdash/charts/).

Troubleshooting:

If models are missing in Lightdash: run make dbt-compile and refresh dbt in Lightdash.
If CLI download fails with auth errors: rerun make lightdash-login.
If CLI commands fail because service is not running: run make up first.

Raw Data

The following tables are pre-loaded in the raw database:

`raw.ad_events`

Ad-serving events (impressions, clicks, viewable impressions) over ~30 days.

Column	Type	Notes
`event_id`	String	Event identifier
`event_type`	String	`impression`, `click`, `viewable_impression`
`event_timestamp`	DateTime64(3)	When the event occurred
`publisher_id`	UInt32	Publisher (website)
`site_domain`	String	Domain where ad was served
`ad_unit_id`	String	Ad placement identifier
`campaign_id`	Nullable(UInt32)	Advertiser campaign — NULL = unfilled
`advertiser_id`	Nullable(UInt32)	Advertiser identifier
`device_type`	String	`desktop`, `mobile`, `tablet`, `ctv`
`country_code`	String	Two-letter country code
`browser`	String	Browser name
`revenue_usd`	Decimal64(6)	Revenue in USD
`bid_floor_usd`	Decimal64(6)	Minimum bid price
`is_filled`	UInt8	1 = ad was served, 0 = unfilled
`_loaded_at`	DateTime	When the ETL pipeline loaded this row

`raw.publishers`

Publisher / website dimension data (20 publishers).

`raw.campaigns`

Advertiser campaign dimension data (27 campaigns).

`raw.ad_units`

Ad unit / placement configuration (60 ad units across all publishers).

Note: Treat this data as you would data from a production system. Explore it thoroughly before modeling — not all of it is clean.

Your Tasks

Part 1: Data Modeling with dbt (60%)

Build a dbt project that transforms the raw data into a clean, reliable analytics layer.

Required staging models (models/staging/):

stg_ad_events — Deduplicated, cleaned ad events
stg_publishers — Cleaned publisher dimension
stg_campaigns — Cleaned campaign dimension

Required mart models (models/marts/):

dim_publishers — Publisher dimension table
fct_ad_events_daily — Daily aggregated ad metrics at the grain of your choosing (document it)
- Required metrics: impressions, clicks, revenue_usd, fill_rate

Nice to have (if time permits):

stg_ad_units, dim_campaigns, dim_ad_units
Additional metrics: viewable_impressions, ctr, viewability_rate
fct_publisher_performance — Publisher-level daily summary

Expectations:

Follow dbt best practices: source(), ref(), staging → marts pattern, consistent naming
Handle data quality issues you discover — and document your decisions
Write tests in schema.yml files:
- unique and not_null tests on primary / surrogate keys
- At least one other test type (accepted_values, relationships, or a custom singular test)
Write documentation:
- Descriptions for all models and key columns
- Document any assumptions or business logic applied

Part 2: Dashboard in Lightdash (25%)

Connect your dbt project to Lightdash and build a dashboard that answers these business questions:

Revenue Overview — Total revenue over time, broken down by publisher.
Fill Rate Analysis — What is the fill rate by publisher? Any publishers that stand out?
One insight of your choice — What else is interesting or concerning in this data?

Part 3: Design Document (15%)

Write a DESIGN.md in the repo root covering:

Your data modeling approach — Why did you structure it this way?
Data quality issues — What did you find? How did you handle each?
Trade-offs — What shortcuts did you take? What would you change with more time?
Production readiness — How would you extend this for a real production deployment?

Investigation Task

While exploring the data, you should notice many anomalous patterns that suggests a data quality or business integrity issue.

Document what you find, why it matters from a business perspective, and how you would handle it in a production analytics pipeline.

Hint: Not all traffic is created equal.

Deliverables

#	Deliverable	Format
1	Working dbt project	Models, tests, docs in `dbt/`
2	Lightdash dashboard	Dashboard artifact/code committed in repo
3	Design document	`DESIGN.md` in repo root

We will ask you to walk us through your solution in a follow-up interview.

How We'll Run It

make up           # Start all services
make dbt-deps     # Install dbt packages
make dbt-run      # Run all models
make dbt-test     # Run all tests

Then we'll check:

Models build without errors
Tests pass
Tables exist in ClickHouse analytics database and contain rows
Lightdash dashboard is functional
DESIGN.md is thoughtful and complete

Evaluation Criteria

Criteria	Weight	What we're looking for
Data Modeling	30%	Correct grain, clean staging → marts separation, appropriate column types and naming
dbt Best Practices	20%	Proper use of sources, refs, tests, docs, config; consistent naming conventions
Data Quality Handling	15%	Identifying issues in raw data, appropriate handling in staging, clear documentation
Dashboard & Analysis	25%	Insightful visualisations that answer the business questions clearly
Communication	10%	Clear design doc, meaningful commit messages, coherent explanation of decisions

Time Expectation

2–4 hours. Focus on quality over quantity. A well-modeled subset with thorough tests and documentation is better than a sprawling project with no tests.

Important Notes

Commit frequently with meaningful, descriptive commit messages. We want to see your iterative thought process — not a single "final" commit.
Document your assumptions. When the data or requirements are ambiguous, state your assumption and proceed.
The raw data contains many intentional data quality issues. Finding and handling them is a core part of this exercise.
You may use any dbt packages you find helpful (e.g., dbt-utils, dbt-expectations).
If you get stuck on infrastructure, document the issue and focus on the modeling work.
Do not modify clickhouse/init-db.sql or docker-compose.yml (unless fixing a genuine bug — document it if you do).

Available Commands

make up               # Start all services
make down             # Stop all services
make restart          # Restart all services
make logs             # Follow service logs

make dbt-deps         # Install dbt packages
make dbt-run          # Run all dbt models
make dbt-test         # Run all dbt tests
make dbt-compile      # Compile dbt project (for Lightdash)
make dbt-docs         # Generate dbt documentation
make dbt-shell        # Open a bash shell in the dbt container

make lightdash-login                                                  # Authenticate Lightdash CLI inside Docker
make lightdash-download-dashboard LIGHTDASH_DASHBOARD=<slug|uuid|url> # Export one dashboard as code via Docker CLI
make lightdash-download-all                                           # Export all dashboards/charts as code via Docker CLI

make clickhouse-client  # Open ClickHouse CLI
make clean              # Remove all volumes and start fresh

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
clickhouse		clickhouse
dbt		dbt
lightdash		lightdash
.env		.env
.gitignore		.gitignore
DESIGN.md		DESIGN.md
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analytics Engineering Take-Home Challenge

Overview

The Stack

Getting Started

Prerequisites

1. Clone and start

2. Verify the raw data

3. Start building

4. Connect Lightdash

5. Candidate dashboard submission workflow (dashboard as code)

Raw Data

`raw.ad_events`

`raw.publishers`

`raw.campaigns`

`raw.ad_units`

Your Tasks

Part 1: Data Modeling with dbt (60%)

Part 2: Dashboard in Lightdash (25%)

Part 3: Design Document (15%)

Investigation Task

Deliverables

How We'll Run It

Evaluation Criteria

Time Expectation

Important Notes

Available Commands

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Analytics Engineering Take-Home Challenge

Overview

The Stack

Getting Started

Prerequisites

1. Clone and start

2. Verify the raw data

3. Start building

4. Connect Lightdash

5. Candidate dashboard submission workflow (dashboard as code)

Raw Data

raw.ad_events

raw.publishers

raw.campaigns

raw.ad_units

Your Tasks

Part 1: Data Modeling with dbt (60%)

Part 2: Dashboard in Lightdash (25%)

Part 3: Design Document (15%)

Investigation Task

Deliverables

How We'll Run It

Evaluation Criteria

Time Expectation

Important Notes

Available Commands

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`raw.ad_events`

`raw.publishers`

`raw.campaigns`

`raw.ad_units`

Packages