feat: config-driven distillation pipeline for per-school fine-tuned models by William-Hill · Pull Request #93 · devcolor/codebenders-datathon

William-Hill · 2026-03-28T04:33:30Z

Summary

Adds a 5-stage Python training pipeline (distill → prepare → finetune → eval → export) that generates per-school fine-tuned Qwen 3.5 models for course explanations and query summaries
Replaces OpenAI API dependency in /api/courses/explain-pairing and /api/query-summary with a model-client adapter that routes to local Ollama (fine-tuned) or OpenAI (fallback)
Includes full Bishop State institutional config (demographics, equity gaps, interventions, workforce context, etc.) that gets baked into model training data
Supports dual teacher backends: Claude Sonnet (production quality, ~$20/school) or Qwen 3.5 27B via Ollama (free iteration)
Fine-tunes via MLX QLoRA on Apple Silicon — 9B model on 36GB Mac, 4B on 18GB Mac
New school = write a config.yaml + seed_queries.yaml, run 5 commands

New files

Directory	Purpose
`training/`	Python pipeline: config, prompts, seed, distill, prepare, finetune, eval, export
`schools/bishop-state/`	Institutional config + 28 seed queries
`tests/training/`	72 pytest tests
`codebenders-dashboard/lib/model-client.ts`	Ollama/OpenAI routing adapter

Test plan

72 Python tests pass (pytest tests/training/ -v)
Dashboard builds cleanly (npm run build)
All 5 CLI entry points respond to --help
Bishop State config loads end-to-end
Run python -m training.distill --school bishop-state --local with Ollama
Run full pipeline through export and verify Ollama model serves responses

🤖 Generated with Claude Code

Per-school fine-tuning pipeline to replace OpenAI dependency for explanation and summarization endpoints with locally-served Qwen 3.5 models via MLX and Ollama.

…l client

…ed utilities - Unify generate_explainer/summarizer_pairs into single generate_pairs() - Extract read_jsonl() and get_message_content() to config.py - Replace 3 duplicate _extract_*_content helpers with get_message_content - Lazy-init OpenAI client in model-client.ts (skip when using Ollama) - Extract shared generate() helper in model-client.ts - Move import re to module level in eval.py - Remove redundant config mutation in finetune.py - Batch flush every 25 records instead of every record - Remove unnecessary what-comments

William-Hill · 2026-03-28T12:31:00Z

@coderabbitai review

- Pass maxTokens through to Ollama backend (was hardcoded to 1024) - Remove unused variables: seed_queries in distill.py, config in eval.py and export.py - Fix num_layers using lora_rank instead of dedicated config key in finetune.py - Remove duplicate pyyaml entry in requirements.txt - Clean up unused imports

William-Hill and others added 17 commits February 24, 2026 13:08

docs: design doc for self-service data upload (issue #86)

075b3f5

docs: implementation plan for self-service data upload (issue #86)

184202e

docs: design spec for config-driven distillation pipeline

cb20f2d

Per-school fine-tuning pipeline to replace OpenAI dependency for explanation and summarization endpoints with locally-served Qwen 3.5 models via MLX and Ollama.

docs: implementation plan for config-driven distillation pipeline

ae971e4

chore: scaffold training pipeline package and test infrastructure

5575f0b

feat(training): config loader with YAML validation and JSONL writer

aa7dc3f

feat(training): add Bishop State school config and seed queries

7601828

feat(training): teacher prompt templates for explainer and summarizer

c547ce9

feat(training): seed data generation for explainer and summarizer

0dcab64

feat(training): distillation pipeline with dual teacher backend support

227439a

feat(training): dataset preparation — filter, dedup, and split

7c60660

feat(training): eval harness with ship criteria for model quality gates

25612c2

feat(training): MLX QLoRA fine-tuning wrapper

2cefc17

feat(training): Ollama model export and registration

a189e9d

feat(dashboard): model client adapter for Ollama/OpenAI routing

17c9ea7

feat(dashboard): route explain-pairing and query-summary through mode…

b437a6e

…l client

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: config-driven distillation pipeline for per-school fine-tuned models#93

feat: config-driven distillation pipeline for per-school fine-tuned models#93
William-Hill wants to merge 18 commits intomainfrom
feature/distillation-pipeline

William-Hill commented Mar 28, 2026

Uh oh!

William-Hill commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

William-Hill commented Mar 28, 2026

Summary

New files

Test plan

Uh oh!

William-Hill commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant