Skip to content

feat: config-driven distillation pipeline for per-school fine-tuned models#93

Open
William-Hill wants to merge 18 commits intomainfrom
feature/distillation-pipeline
Open

feat: config-driven distillation pipeline for per-school fine-tuned models#93
William-Hill wants to merge 18 commits intomainfrom
feature/distillation-pipeline

Conversation

@William-Hill
Copy link
Copy Markdown
Collaborator

Summary

  • Adds a 5-stage Python training pipeline (distill → prepare → finetune → eval → export) that generates per-school fine-tuned Qwen 3.5 models for course explanations and query summaries
  • Replaces OpenAI API dependency in /api/courses/explain-pairing and /api/query-summary with a model-client adapter that routes to local Ollama (fine-tuned) or OpenAI (fallback)
  • Includes full Bishop State institutional config (demographics, equity gaps, interventions, workforce context, etc.) that gets baked into model training data
  • Supports dual teacher backends: Claude Sonnet (production quality, ~$20/school) or Qwen 3.5 27B via Ollama (free iteration)
  • Fine-tunes via MLX QLoRA on Apple Silicon — 9B model on 36GB Mac, 4B on 18GB Mac
  • New school = write a config.yaml + seed_queries.yaml, run 5 commands

New files

Directory Purpose
training/ Python pipeline: config, prompts, seed, distill, prepare, finetune, eval, export
schools/bishop-state/ Institutional config + 28 seed queries
tests/training/ 72 pytest tests
codebenders-dashboard/lib/model-client.ts Ollama/OpenAI routing adapter

Test plan

  • 72 Python tests pass (pytest tests/training/ -v)
  • Dashboard builds cleanly (npm run build)
  • All 5 CLI entry points respond to --help
  • Bishop State config loads end-to-end
  • Run python -m training.distill --school bishop-state --local with Ollama
  • Run full pipeline through export and verify Ollama model serves responses

🤖 Generated with Claude Code

William-Hill and others added 17 commits February 24, 2026 13:08
Per-school fine-tuning pipeline to replace OpenAI dependency for
explanation and summarization endpoints with locally-served Qwen 3.5
models via MLX and Ollama.
…ed utilities

- Unify generate_explainer/summarizer_pairs into single generate_pairs()
- Extract read_jsonl() and get_message_content() to config.py
- Replace 3 duplicate _extract_*_content helpers with get_message_content
- Lazy-init OpenAI client in model-client.ts (skip when using Ollama)
- Extract shared generate() helper in model-client.ts
- Move import re to module level in eval.py
- Remove redundant config mutation in finetune.py
- Batch flush every 25 records instead of every record
- Remove unnecessary what-comments
@William-Hill
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

- Pass maxTokens through to Ollama backend (was hardcoded to 1024)
- Remove unused variables: seed_queries in distill.py, config in eval.py and export.py
- Fix num_layers using lora_rank instead of dedicated config key in finetune.py
- Remove duplicate pyyaml entry in requirements.txt
- Clean up unused imports
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant