Support Qwen3.5 by ErlisLushtaku · Pull Request #32 · OpenEuroLLM/JudgeArena

ErlisLushtaku · 2026-04-06T21:57:42Z

Updated dependencies to support Qwen3.5
Added structured outputs to make the judge output the scores rather than outputing other things until crossing token limits which was happening a lot.

- fix dependencies - add structured output to prevent judge from not respecting the prompt

kargibora · 2026-04-07T08:28:58Z

pyproject.toml

 [project.optional-dependencies]
-vllm = ["vllm==0.10.2", "transformers>=4.55.2,<5.0.0"]
+# vLLM on PyPI pins transformers<5; optional extra matches that so `uv lock` can resolve.
+vllm = ["vllm>=0.17.0,<1.0.0", "transformers>=4.56.0,<5.0.0"]


vllm>=0.17.0,<1.0.0 is a very wide range. A few concerns:

Was this tested with a prebuilt wheel or built from source? Building vLLM from source on cluster nodes often fails due to CUDA kernel compilation issues.

Is the StructuredOutputsParams import path (vllm.sampling_params) stable across this entire range? It may have been introduced in 0.17 and could move. For example StructuredOutputParams was a bit different when vllm==0.11.0. Thus I think it makes more sense to create more stable versioning

Good point. I tightened the range. 0.18.1 was working. I think the StructuredOutputParams is stable accross the new range.

kargibora · 2026-04-07T08:32:08Z

judgearena/evaluate.py

+_PAIR_SCORE_MAX = 10
+
+
+def build_pair_score_output_choices() -> list[str]:


The cartesian product approach works for a single A-vs-B pair (11×11 = 121 choices), but won't scale to multi-criteria evaluation — with N dimensions it becomes 11^(2N) choices, which is unusable.

May be we can consider switching to a JSON schema constraint instead of choice, e.g. {"score_A": int, "score_B": int} per criterion. VLLM's StructuredOutputsParams already supports json_schema alongside choice, so this would be a drop-in change.

Agreed, updated

kargibora · 2026-04-07T08:33:00Z

judgearena/evaluate.py

-    )
+        )
+    if truncated_completion_count:
+        print(


Flagging for a follow-up PR: the codebase mixes print() for warnings, progress, and debug info, making it hard to filter by severity or redirect output. We should migrate to Python's logging module (or at minimum a thin wrapper like logger = logging.getLogger(__name__)). What do you think @geoalgo

- Switch from choice-based structured outputs to JSON schema constraint - Tighten vllm version range from >=0.17.0,<1.0.0 to >=0.17.0,<0.19.0

ErlisLushtaku and others added 5 commits April 6, 2026 23:02

update dependencies to support Qwen 3.5

c6b2b0a

slurmpilot scripts

1f4bae8

update dep versions

25b0355

fix support for VLLM

ab065fd

- fix dependencies - add structured output to prevent judge from not respecting the prompt

remove qwen35 smoke launcher

ef1c92c

ErlisLushtaku changed the title ~~Support qwen 3.5~~ Support Qwen3.5 Apr 6, 2026

kargibora reviewed Apr 7, 2026

View reviewed changes

ErlisLushtaku force-pushed the erlislushtaku/fix/support-qwen-3.5 branch from ab3db1b to ef1c92c Compare April 7, 2026 14:19

ErlisLushtaku added 2 commits April 7, 2026 16:23

use json schema structured outputs, tighten vllm range

32f2e7e

- Switch from choice-based structured outputs to JSON schema constraint - Tighten vllm version range from >=0.17.0,<1.0.0 to >=0.17.0,<0.19.0

fix formatting

5f2edf0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Qwen3.5#32

Support Qwen3.5#32
ErlisLushtaku wants to merge 7 commits intomainfrom
erlislushtaku/fix/support-qwen-3.5

ErlisLushtaku commented Apr 6, 2026 •

edited

Loading

Uh oh!

kargibora Apr 7, 2026

Uh oh!

ErlisLushtaku Apr 7, 2026

Uh oh!

kargibora Apr 7, 2026

Uh oh!

ErlisLushtaku Apr 7, 2026

Uh oh!

kargibora Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		_PAIR_SCORE_MAX = 10


		def build_pair_score_output_choices() -> list[str]:

Conversation

ErlisLushtaku commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kargibora Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

ErlisLushtaku Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

kargibora Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

ErlisLushtaku Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

kargibora Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ErlisLushtaku commented Apr 6, 2026 •

edited

Loading