TODO 26 February

Priority 1: Qwen3.5 Model Onboarding

Plan: claude_plans/qwen3_5_onboarding_plan.md

Three Qwen3.5 models to onboard using /add-model skill for each:

Phase 1: Evaluate — architecture, quant selection, sizing done
Phase 2: Download — all three models downloaded
Phase 3: Create production profiles — all three added to models.conf
Phase 4: Find sampler settings — Qwen3.5 family (thinking model, documented)
Phase 5: Test with 262K context
- 35B-A3B: ~120 t/s, 3 graph splits — excellent
- 122B-A10B: ~18 t/s, 65 graph splits — works, RAM tight
- 27B dense: CUDA crash on inference — needs investigation
Phase 6: Create bench profiles — all three added
Phase 7: Run EvalPlus HumanEval+ benchmarks (35B + 122B done, 27B blocked)
- 122B-A10B: 97.6% HumanEval, 94.5% HumanEval+ (#2 overall, behind Claude only)
- 35B-A3B: 95.1% HumanEval, 90.9% HumanEval+ (ties Qwen3-Coder-Next at 4x speed)
Phase 8: Update documentation (doc-keeper) — in progress
Post-onboarding: retirement decisions made (see below)

Models retired from active use (bench data and REPORT.md preserved):

GPT-OSS 120B (61 GB) — outclassed by Qwen3.5-122B (94.5% vs 87.2% HE+, similar speed)
Qwen3-Coder-Next (57 GB) — matched by Qwen3.5-35B (90.9% HE+ each, 120 vs 33 t/s)
Qwen3-Next-80B-A3B (53 GB) — replaced by Qwen3.5-122B (94.5% vs 93.9% HE+)

Active lineup going forward:

Extensive hands-on testing of claude-local — use it for real tasks, try different models (including the new Qwen3.5 models), explore edge cases. Findings and ideas feed into roadmap and future improvements.