Author: Ross Stafford โข PMPยฎ | Data & Product PM
Live repo: https://github.com/RStaff/https-github.com-rossstafford-ross-ai-portfolio
A small experiment built in 2024 to see if store-visit data can predict sales (โalphaโ = trading signal). I merged Google Analytics 4 visit counts (people near each store) with that storeโs point-of-sale revenue for the same weeks. I used PySpark to engineer and store model features (e.g., 7-day lag, rolling averages) in a scalable way. I trained Random-Forest model, the machine-learning algorithm you trained to predict next weekโs sales. The modelโs average error was 8 % lower than a very simple benchmark (e.g., โnext week = this weekโ). In other words, itโs meaningfully more accurate. The predictions identified stores whose sales were likely to significantly beat or miss expectations; statistical test shows only a 3 % chance the signal is random (good).
| Step | Tech | Outcome |
|---|---|---|
| ETL | Python / pandas | Join synthetic GA4 geofence events with POS sales |
| Model | scikit-learn RandomForest | โ MAE 8 % vs. naรฏve baseline |
| Viz | Looker Studio | Dual-axis chart for planners (sales vs. visits) |
cd ga4_dashboard
python3 foot_traffic.ipynb # or open in JupyterLab
open chart.png # exported Looker chart
**๐ก๏ธ Project 2 โ Airflow Data-Quality Guard Rails**
Check Logic Alert
Row count Fail if < 700 rows Airflow task โ red
Null ratio Fail if any visits NULLs Upstream-failed
๐๏ธ Architecture
GA4 events POS csv
โ โ
โโโโโบ pandas โ RandomForest โโบ predictions.csv
โ
Looker Studio dashboard โโโโโโโโโโโโโโโโ
โผ
Airflow (row & null checks) โ SQLite
๐ Tech Stack
Python 3.11, pandas, scikit-learn
Docker Compose (Airflow 2.9, Postgres, Redis)
Looker Studio โข SQLite โข GitHub Actions (CI / Trivy / Bandit)
๐ License
MIT โ free to fork & remix. Attribution appreciated!
