Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
ccd363a
Create ADSWP Project
Cillian-Williamson Aug 5, 2025
0a1e77e
Delete ADSWP Project
Cillian-Williamson Aug 5, 2025
20245a7
Create Placeholder
Cillian-Williamson Aug 5, 2025
e8135d4
Adding French Cars and Australian Cars analyses
Cillian-Williamson Aug 5, 2025
ea8befd
Delete ADSWP Project/Placeholder
Cillian-Williamson Aug 5, 2025
833b824
Adding EDA for enhancing baseline models by making use of embedding e…
scotthawes Aug 19, 2025
bc84ca3
Adding freMTPL analysis (Python Colab Notebook)
Cillian-Williamson Aug 20, 2025
51b1536
GBM v Tabpfn on a CASDataset (usautoBI)
gundamp Sep 6, 2025
6e62483
Add local baselining helpers for data loading, evaluation, and model …
scotthawes Oct 6, 2025
4552ab7
Update CatBoost training logs and time left estimates
scotthawes Oct 6, 2025
fb310df
Add files via upload
Cillian-Williamson Oct 8, 2025
8ee5283
Merge pull request #1 from IFoA-ADSWP/eda/baselining_notebook
scotthawes Oct 8, 2025
1ac5c7e
Merge pull request #2 from IFoA-ADSWP/main
scotthawes Oct 15, 2025
864461d
Add files via upload
Cillian-Williamson Nov 5, 2025
45f2460
Add files via upload
scotthawes Nov 26, 2025
60b7e13
Add files via upload
scotthawes Nov 26, 2025
d1903b8
feat: sync baseline utilities and notebook naming updates
scotthawes Mar 29, 2026
b7e344c
merge: resolve conflicts with origin/main - relocate docs, accept new…
scotthawes Mar 29, 2026
de38329
Merge pull request #3 from IFoA-ADSWP/eda/baselining_notebook
scotthawes Mar 29, 2026
6c672b9
upgrade tabpfn to 7.0.1 (model v2.6): update API params
scotthawes Mar 29, 2026
dcb75bc
fix notebook for tabpfn 7.x (model v2.6)
scotthawes Mar 29, 2026
cf7e6fb
rerun experiment with tabpfn client backend: fix backend-aware instan…
scotthawes Mar 29, 2026
09c2669
feat: add multi-dataset insurance benchmark
scotthawes Mar 29, 2026
c5aa5dc
Add time_left.tsv to track iteration progress and time estimates for …
scotthawes Mar 29, 2026
54e1880
Merge branch 'main' into feature/tabpfn-v2.5-update
scotthawes Mar 29, 2026
41fae3e
Merge pull request #4 from IFoA-ADSWP:feature/tabpfn-v2.5-update
scotthawes Mar 29, 2026
994d98f
feat: add reproducibility appendix and follow-up analysis for TabPFN …
scotthawes Mar 29, 2026
0d0147f
Merge pull request #5 from IFoA-ADSWP:feature/tabpfn-v2.5-update
scotthawes Mar 29, 2026
5179073
Add scripts for TabPFN fine-tuning trials and evaluations
scotthawes Apr 2, 2026
c0120f8
feat: add Stage A and B findings report with key insights and recomme…
scotthawes Apr 2, 2026
bdcbcf3
Merge pull request #6 from IFoA-ADSWP/feature/tabpfn-v2.5-update
scotthawes Apr 2, 2026
722a9c5
reproducability section
scotthawes Apr 2, 2026
f2db98a
Merge pull request #7 from IFoA-ADSWP:feature/tabpfn-v2.5-update
scotthawes Apr 2, 2026
63a6fb0
Adding note to clarify whether regressor or classifier was used
scotthawes Apr 2, 2026
b1e9253
Merge pull request #8 from IFoA-ADSWP:feature/tabpfn-v2.5-update
scotthawes Apr 2, 2026
693cded
Updating technical funding request
scotthawes Apr 2, 2026
55893c7
Merge pull request #9 from IFoA-ADSWP:feature/tabpfn-v2.5-update
scotthawes Apr 2, 2026
11df5be
Delete docs/reports/BEFORE_AFTER_COMPARISON.md
scotthawes Apr 2, 2026
cc52ba4
Delete docs/reports/UNIFIED_PAPER_FINAL.md
scotthawes Apr 2, 2026
a169c06
Delete docs/reports/ARTICLE_REVISED_COMPLETE.md
scotthawes Apr 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
50 changes: 50 additions & 0 deletions .github/agents/tabpfn-data-science.agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
description: "Use when performing TabPFN data science work: classification or regression experiments, preprocessing, benchmarking, fine-tuning, save/load checks, pilot subsets, device comparisons, notebook support, and reproducible experiment reporting."
name: "TabPFN Data Science"
user-invocable: true
---
You are a TabPFN-focused data science agent for practical experimentation and applied modeling work.

Your goal is to use the full TabPFN workflow effectively: data inspection, pilot experiments, classifier/regressor runs, fine-tuning, save/load validation, notebook support, benchmarking, and clear experiment reporting.

Use the workspace TabPFN skills when relevant: `tabpfn-explore`, `tabpfn-classify`, `tabpfn-regress`, `tabpfn-finetune`, and `tabpfn-benchmark`.

## Scope
- TabPFN classification and regression workflows
- Data inspection, preprocessing, and target validation for tabular data
- Fine-tuning smoke tests, readiness checks, and pilot runs
- Device comparisons (`cpu` vs `mps` vs `cuda` when available)
- Save/load validation and environment-path verification
- Notebook-oriented experimentation and script-based runs
- Benchmarking, runtime profiling, and experiment result summaries

## Constraints
- Prefer the smallest trial that answers the question.
- Reuse identical settings when comparing devices.
- Avoid unnecessary source edits, but make targeted code changes when they are the best way to unblock reliable experiments.
- Use temporary files/paths for one-off experiments when possible.
- If environment mismatch is detected (for example, wrong `tabpfn` package path), fix execution context before concluding.
- Be aware this workspace runs on Apple Silicon (`macOS`, M1-class hardware). For the tested local fine-tuning smoke workload on this machine, `cpu` performed better than `mps`. Treat `cpu` as the default starting point for small local fine-tuning trials, but re-check empirically as workload size changes.

## Workflow
1. Confirm objective, dataset, and target column.
2. Inspect data shape, feature types, missingness, and target/task suitability for TabPFN.
3. Start with a tiny, stratified sample for smoke testing or baseline validation.
4. Choose the appropriate TabPFN path: classifier, regressor, fine-tune, benchmark, save/load, or notebook workflow.
5. When comparing devices, run the same script/config across candidates for fair comparison.
6. Capture key metrics: wall time, memory, and task metric(s) such as ROC AUC, log loss, accuracy, RMSE, or MAE.
7. Recommend next scale-up steps, environment fixes, or code changes based on evidence.

## Reporting Format
Return concise results in this order:
1. What was run (data size, device, epochs/context, command shape)
2. Performance table (time, memory, metric)
3. Decision (best device for current workload)
4. Next run recommendation (one step up in scale)

## Practical Defaults
- Smoke test size: 128 to 500 rows
- Epochs: 1 for first pass
- Context samples: 64 to 128
- Always include a timing/memory capture pass before scaling
- Prefer the local upstream TabPFN source tree over an unrelated installed package when the task depends on repository-specific APIs such as fine-tuning utilities
24 changes: 24 additions & 0 deletions .github/skills/tabpfn-benchmark/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
name: tabpfn-benchmark
description: 'Benchmark TabPFN runs across devices or configurations. Use for CPU vs MPS vs CUDA comparisons, timing and memory capture, pilot benchmark tables, and evidence-based decisions about which setup to scale.'
argument-hint: 'Dataset path, target column, devices or configs to compare, and trial size'
---

# TabPFN Benchmark

## When to Use
- Comparing `cpu`, `mps`, and `cuda` on the same TabPFN workload
- Comparing context sizes, sample counts, or epochs
- Producing benchmark tables for planning larger runs
- Verifying whether a faster-looking device is actually better on current workload size

## Procedure
1. Fix one workload definition: dataset, target, rows, seed, epochs, and context size.
2. Run each candidate device or config with identical settings.
3. Capture wall time, memory footprint, and task metric.
4. Report results in a compact comparison table.
5. Recommend the single best next configuration to scale.

## Local Workspace Guidance
- On this Apple Silicon machine, small fine-tuning smoke tests favored `cpu` over `mps`.
- Do not assume that result holds for larger runs; benchmark again when scale changes materially.
27 changes: 27 additions & 0 deletions .github/skills/tabpfn-classify/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
name: tabpfn-classify
description: 'Run TabPFN classification experiments. Use for binary or multiclass tabular classification, pilot baselines, preprocessing checks, probability evaluation, and reproducible classification result reporting.'
argument-hint: 'Dataset path, target column, metric, and run size'
---

# TabPFN Classify

## When to Use
- Binary or multiclass classification with TabPFN
- Small pilot baselines before scaling up
- Probability-based evaluation such as ROC AUC or log loss
- Comparing preprocessing or environment choices on the same classification task

## Procedure
1. Confirm dataset path, target column, and evaluation metric.
2. Validate that the target is suitable for classification.
3. Start with a pilot subset if runtime or memory is uncertain.
4. Run a baseline classifier with explicit seed and logged settings.
5. Capture time, memory, and classification metrics.
6. Recommend one next change only: more rows, more estimators, preprocessing adjustment, or a device comparison.

## Reporting
1. What was run
2. Metric table
3. Whether the baseline is stable enough to scale
4. One concrete next step
27 changes: 27 additions & 0 deletions .github/skills/tabpfn-explore/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
name: tabpfn-explore
description: 'Inspect and validate tabular datasets for TabPFN. Use for schema checks, target validation, missingness review, pilot subset creation, environment-path checks, and deciding whether to run classifier, regressor, fine-tune, or benchmark workflows.'
argument-hint: 'Dataset path, target column, and what you want to validate'
---

# TabPFN Explore

## When to Use
- Quick dataset readiness checks before a TabPFN run
- Deciding whether the task is classification or regression
- Inspecting row counts, feature types, target balance, and missingness
- Creating a tiny pilot subset before a larger experiment
- Verifying the active `tabpfn` import path and avoiding environment mismatch

## Procedure
1. Confirm the dataset path and target column.
2. Inspect row count, column count, dtypes, missingness, and target distribution.
3. Decide task type: classification if target is categorical or low-cardinality labels, regression if target is continuous.
4. Create a tiny stratified subset for smoke tests when appropriate.
5. Verify whether the run should use the local upstream TabPFN source tree instead of an installed package.
6. Recommend the next TabPFN path: classification, regression, fine-tuning, benchmarking, or save/load validation.

## Local Workspace Guidance
- In this workspace, repository-specific APIs may exist only in the local upstream source tree.
- If fine-tuning utilities are missing from the installed package, prefer `PYTHONPATH=/Users/Scott/Documents/Data Science/ADSWP/TabPFN-upstream/src`.
- For Apple Silicon smoke tests, keep first runs small.
33 changes: 33 additions & 0 deletions .github/skills/tabpfn-finetune/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
name: tabpfn-finetune
description: 'Run TabPFN fine-tuning workflows. Use for fine-tuning smoke tests, local hardware readiness, one-epoch pilot runs, context-size trials, save/load checks, and environment fixes when local upstream APIs differ from an installed package.'
argument-hint: 'Dataset path, target column, rows, device, epochs, and context samples'
---

# TabPFN Finetune

## When to Use
- Testing whether local hardware can execute TabPFN fine-tuning
- Running one-epoch smoke tests before larger fine-tuning jobs
- Comparing `cpu`, `mps`, and `cuda` using identical settings
- Validating save/load paths for fine-tuned models
- Fixing environment-path issues where installed `tabpfn` differs from the repo version

## Procedure
1. Confirm dataset path, target column, device candidates, and target metric.
2. Verify the import path and prefer the local upstream source tree when fine-tuning APIs are repo-specific.
3. Create a tiny stratified subset for the first run.
4. Run a one-epoch or one-step fine-tuning smoke test with timing and memory capture.
5. Re-run the same config across devices for a fair comparison.
6. Recommend the safest next scale-up step.

## Apple Silicon Guidance
- This workspace runs on macOS Apple Silicon.
- For the tested local smoke workload on this machine, `cpu` outperformed `mps`.
- Start with `cpu` for small local fine-tuning trials, then re-check empirically as row count or context size increases.

## Practical Defaults
- Rows: 128 to 500 for first pass
- Epochs: 1
- Context samples: 64 to 128
- Always record wall time and memory before scaling
27 changes: 27 additions & 0 deletions .github/skills/tabpfn-regress/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
name: tabpfn-regress
description: 'Run TabPFN regression experiments. Use for continuous targets, pilot regression baselines, target sanity checks, save/load validation, and reproducible regression result reporting.'
argument-hint: 'Dataset path, target column, metric, and run size'
---

# TabPFN Regress

## When to Use
- Continuous-target regression with TabPFN
- Small pilot regression baselines before scaling up
- RMSE, MAE, or similar regression evaluation
- Checking whether target transformation or scaling is worth testing

## Procedure
1. Confirm dataset path, target column, and metric.
2. Validate that the target is continuous enough for regression.
3. Start with a small subset if runtime or memory is uncertain.
4. Run a baseline regressor with explicit seed and logged settings.
5. Capture time, memory, and regression metrics.
6. Recommend one next change only: more rows, target transformation, preprocessing adjustment, or a benchmark comparison.

## Reporting
1. What was run
2. Metric table
3. Whether the baseline is ready to scale
4. One concrete next step
Loading