Skip to content

Ricky042/KaggleMarchMadness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🏆 MarchMadness2026

Kaggle competition entry — probabilistic prediction system for the 2026 NCAA March Madness tournament. Trained on historical data spanning 40+ years, with a custom Elo ranking engine and feature engineering pipeline that outperformed public betting baselines.

Python scikit-learn pandas Kaggle


Results

  • Competition: Google Kaggle — March Machine Learning Mania 2026
  • Metric: Brier score / log-loss on tournament outcome probabilities
  • Outcome: Outperformed public betting odds baseline on held-out tournament years

Final Score: Outcome not determined yet


Approach

1. Custom Elo Rating System

Designed from scratch to model dynamic team strength over time:

  • K-factor tuned per era to account for changes in game pace and competition structure
  • Home court advantage modelled as a rating offset
  • Season decay applied to prevent historical dominance from overwhelming recent form
  • Ratings updated after every game across 40+ years of historical data

2. Feature Engineering

  • Novel seeding and ranking features built on top of Elo ratings
  • Aggregate team stats per season: offensive/defensive efficiency, strength of schedule (SOS)
  • Historical head-to-head matchup records
  • Upset frequency modelled by seed differential

3. Modelling Pipeline

  • Logistic regression baseline → gradient boosted trees → ensemble
  • Cross-validated on historical tournament data (held-out years as test sets)
  • Probabilistic outputs calibrated using Platt scaling
  • Evaluated with log-loss and scenario simulation

Repo Structure

/data           — raw NCAA data from Kaggle (1985–2025)
/elo            — custom Elo rating engine
/features       — feature engineering pipeline
/models         — model training and evaluation scripts
/notebooks      — EDA and result analysis
predict.py      — generate submission predictions

How to Run

Install dependencies

pip install pandas numpy scikit-learn matplotlib

Run the full prediction pipeline

python predict.py --year 2026 --output submission.csv

Evaluate against historical data

python models/evaluate.py --test-year 2024

Key Findings

  • Elo ratings became more predictive than seed alone from the Sweet 16 onwards
  • Upsets in the first round were best predicted by strength-of-schedule differential, not seed gap
  • The model assigned X% win probability to the eventual champion before the tournament began (to add after tournament)

About

Kaggle competition entry — probabilistic prediction system for the 2026 NCAA March Madness tournament. Trained on historical data spanning 40+ years, with a custom Elo ranking engine and feature engineering pipeline that outperformed public betting baselines.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors