You are joining the data team at Venatus, a programmatic advertising company. Raw data from our ad-serving platform has been pre-loaded into a ClickHouse data warehouse. Your task is to:
- Build a clean, tested, and documented analytics layer using dbt.
- Create a dashboard in Lightdash to surface key business insights.
- Investigate the data and document what you find.
- This challenge has been intentionally designed to detect and cause problems for AI. Do not rely on AI to write your markdown summaries.
This exercise is designed to assess the core skills of an analytics engineer: data modeling, transformation, testing, documentation, and communicating with data.
| Tool | Purpose | Access |
|---|---|---|
| ClickHouse | Columnar data warehouse (raw data pre-loaded) | http://localhost:8123/play |
| dbt (dbt-clickhouse) | Data transformation framework | Runs via Docker |
| Lightdash | Open-source BI tool (dbt-native) | http://localhost:8880 |
Everything runs in Docker. One command to start.
- Docker and Docker Compose (v2+)
make(standard on macOS / Linux)- Git
git clone git@github.com:Venatus/analytics-coding-challenge.git
cd analytics-coding-challenge
make upmake up now also runs an automatic dbt bootstrap (deps, run, compile) after services come up.
Wait ~60 seconds for all services to initialise and bootstrap. You'll see the URLs printed in the terminal.
Open the ClickHouse play UI at http://localhost:8123/play and run:
SELECT count(*) FROM raw.ad_events;
-- Should return ~131,000 rows
SELECT * FROM raw.publishers LIMIT 5;
SELECT * FROM raw.campaigns LIMIT 5;
SELECT * FROM raw.ad_units LIMIT 5;You could also connect DBeaver to Clickhouse to use a UI database manager.
If you change models after startup, rerun dbt commands manually:
make dbt-deps # Install dbt packages (dbt-utils, etc.)
make dbt-run # Run your models (once you've written them)
make dbt-test # Run your tests
make dbt-compile # Compile project (generates manifest.json for Lightdash)Lightdash is pre-configured with an account and a project connected to ClickHouse. Just log in:
- URL: http://localhost:8880
- Email:
admin@lightdash.com - Password:
admin123!
After writing your dbt models, run make dbt-compile then sync the project in Lightdash (Settings → Project → Syncing) so it can discover your models.
At startup, a single neutral starter dashboard/chart is auto-seeded so the Lightdash project is visibly functional. It is intentionally unrelated to the challenge questions.
Committed dashboard-as-code files in lightdash/charts/ and lightdash/dashboards/ are also auto-synced at startup, so reviewers can run make up and immediately see candidate dashboards.
Candidates should build dashboards in the Lightdash UI, then export dashboard code and commit it to the repo.
Recommended flow:
- Create/modify charts and dashboards in Lightdash UI.
- Log in once to Lightdash CLI (runs inside Docker):
make lightdash-login- Export dashboard code:
# Export all dashboards/charts
make lightdash-download-all- Commit exported files under
lightdash/(typicallylightdash/dashboards/andlightdash/charts/).
Troubleshooting:
- If models are missing in Lightdash: run
make dbt-compileand refresh dbt in Lightdash. - If CLI download fails with auth errors: rerun
make lightdash-login. - If CLI commands fail because service is not running: run
make upfirst.
The following tables are pre-loaded in the raw database:
Ad-serving events (impressions, clicks, viewable impressions) over ~30 days.
| Column | Type | Notes |
|---|---|---|
event_id |
String | Event identifier |
event_type |
String | impression, click, viewable_impression |
event_timestamp |
DateTime64(3) | When the event occurred |
publisher_id |
UInt32 | Publisher (website) |
site_domain |
String | Domain where ad was served |
ad_unit_id |
String | Ad placement identifier |
campaign_id |
Nullable(UInt32) | Advertiser campaign — NULL = unfilled |
advertiser_id |
Nullable(UInt32) | Advertiser identifier |
device_type |
String | desktop, mobile, tablet, ctv |
country_code |
String | Two-letter country code |
browser |
String | Browser name |
revenue_usd |
Decimal64(6) | Revenue in USD |
bid_floor_usd |
Decimal64(6) | Minimum bid price |
is_filled |
UInt8 | 1 = ad was served, 0 = unfilled |
_loaded_at |
DateTime | When the ETL pipeline loaded this row |
Publisher / website dimension data (20 publishers).
Advertiser campaign dimension data (27 campaigns).
Ad unit / placement configuration (60 ad units across all publishers).
Note: Treat this data as you would data from a production system. Explore it thoroughly before modeling — not all of it is clean.
Build a dbt project that transforms the raw data into a clean, reliable analytics layer.
Required staging models (models/staging/):
stg_ad_events— Deduplicated, cleaned ad eventsstg_publishers— Cleaned publisher dimensionstg_campaigns— Cleaned campaign dimension
Required mart models (models/marts/):
dim_publishers— Publisher dimension tablefct_ad_events_daily— Daily aggregated ad metrics at the grain of your choosing (document it)- Required metrics:
impressions,clicks,revenue_usd,fill_rate
- Required metrics:
Nice to have (if time permits):
stg_ad_units,dim_campaigns,dim_ad_units- Additional metrics:
viewable_impressions,ctr,viewability_rate fct_publisher_performance— Publisher-level daily summary
Expectations:
- Follow dbt best practices:
source(),ref(), staging → marts pattern, consistent naming - Handle data quality issues you discover — and document your decisions
- Write tests in
schema.ymlfiles:uniqueandnot_nulltests on primary / surrogate keys- At least one other test type (
accepted_values,relationships, or a custom singular test)
- Write documentation:
- Descriptions for all models and key columns
- Document any assumptions or business logic applied
Connect your dbt project to Lightdash and build a dashboard that answers these business questions:
- Revenue Overview — Total revenue over time, broken down by publisher.
- Fill Rate Analysis — What is the fill rate by publisher? Any publishers that stand out?
- One insight of your choice — What else is interesting or concerning in this data?
Write a DESIGN.md in the repo root covering:
- Your data modeling approach — Why did you structure it this way?
- Data quality issues — What did you find? How did you handle each?
- Trade-offs — What shortcuts did you take? What would you change with more time?
- Production readiness — How would you extend this for a real production deployment?
While exploring the data, you should notice many anomalous patterns that suggests a data quality or business integrity issue.
Document what you find, why it matters from a business perspective, and how you would handle it in a production analytics pipeline.
Hint: Not all traffic is created equal.
| # | Deliverable | Format |
|---|---|---|
| 1 | Working dbt project | Models, tests, docs in dbt/ |
| 2 | Lightdash dashboard | Dashboard artifact/code committed in repo |
| 3 | Design document | DESIGN.md in repo root |
We will ask you to walk us through your solution in a follow-up interview.
make up # Start all services
make dbt-deps # Install dbt packages
make dbt-run # Run all models
make dbt-test # Run all testsThen we'll check:
- Models build without errors
- Tests pass
- Tables exist in ClickHouse
analyticsdatabase and contain rows - Lightdash dashboard is functional
DESIGN.mdis thoughtful and complete
| Criteria | Weight | What we're looking for |
|---|---|---|
| Data Modeling | 30% | Correct grain, clean staging → marts separation, appropriate column types and naming |
| dbt Best Practices | 20% | Proper use of sources, refs, tests, docs, config; consistent naming conventions |
| Data Quality Handling | 15% | Identifying issues in raw data, appropriate handling in staging, clear documentation |
| Dashboard & Analysis | 25% | Insightful visualisations that answer the business questions clearly |
| Communication | 10% | Clear design doc, meaningful commit messages, coherent explanation of decisions |
2–4 hours. Focus on quality over quantity. A well-modeled subset with thorough tests and documentation is better than a sprawling project with no tests.
- Commit frequently with meaningful, descriptive commit messages. We want to see your iterative thought process — not a single "final" commit.
- Document your assumptions. When the data or requirements are ambiguous, state your assumption and proceed.
- The raw data contains many intentional data quality issues. Finding and handling them is a core part of this exercise.
- You may use any dbt packages you find helpful (e.g.,
dbt-utils,dbt-expectations). - If you get stuck on infrastructure, document the issue and focus on the modeling work.
- Do not modify
clickhouse/init-db.sqlordocker-compose.yml(unless fixing a genuine bug — document it if you do).
make up # Start all services
make down # Stop all services
make restart # Restart all services
make logs # Follow service logs
make dbt-deps # Install dbt packages
make dbt-run # Run all dbt models
make dbt-test # Run all dbt tests
make dbt-compile # Compile dbt project (for Lightdash)
make dbt-docs # Generate dbt documentation
make dbt-shell # Open a bash shell in the dbt container
make lightdash-login # Authenticate Lightdash CLI inside Docker
make lightdash-download-dashboard LIGHTDASH_DASHBOARD=<slug|uuid|url> # Export one dashboard as code via Docker CLI
make lightdash-download-all # Export all dashboards/charts as code via Docker CLI
make clickhouse-client # Open ClickHouse CLI
make clean # Remove all volumes and start fresh