small features: add option to save cache in parquet, save judge input… by geoalgo · Pull Request #35 · OpenEuroLLM/JudgeArena

geoalgo · 2026-04-08T09:50:54Z

…, improve error handling of openrouter, remove compute_cohen_kappa

Reasons:

saving in parquet is better in term of storage
saving judge input allow to have all the context used to make the judge call and to be able to estimate the cost
the handling of errors in openrouter becomes a bit less whacky
cohen_kappa is available in scikit-learn, we should use this code instead (which produces the same values)

…, improve error handling of openrouter

ErlisLushtaku · 2026-04-09T21:09:27Z

judgearena/evaluate.py

+    completion_A: str  # completion of the first model
+    completion_B: str  # completion of the second model
+    judge_completion: str  # output of the judge
+    judge_input: str | None = None  # input that was passed to the judge


Should this be added to the estimate_elo_ratings.py workflow as well?

It uses judge_and_parse_prefs from this file so it is updated as well if I am not mistaken.

Yes, but I think we are dropping it here because we are constructing the Dataframe manually

ErlisLushtaku · 2026-04-09T21:34:00Z

judgearena/utils.py

+                    return x
+
+                for col in df.select_dtypes(include="object").columns:
+                    df[col] = df[col].apply(_to_python).astype(str)


I think we should be careful here if the dataframe can contain missing values. Calling .astype(str) on missing values (None or np.nan) converts them into strings "None" and "nan". When the parquet file is read back, they would be processed as strings instead of missing values.

yes I agree but I dont see another way to serialize to parquet. I agree that this conversion is loosing the missingness information but I think all downstream code should probably exclude empty strings too when computing annotations.

judgearena/utils.py

small features: add option to save cache in parquet, save judge input…

ee04924

…, improve error handling of openrouter

geoalgo requested a review from ErlisLushtaku April 8, 2026 09:50

ErlisLushtaku reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

small features: add option to save cache in parquet, save judge input…#35

small features: add option to save cache in parquet, save judge input…#35
geoalgo wants to merge 1 commit intomainfrom
small_features

geoalgo commented Apr 8, 2026

Uh oh!

ErlisLushtaku Apr 9, 2026

Uh oh!

geoalgo Apr 10, 2026

Uh oh!

ErlisLushtaku Apr 11, 2026

Uh oh!

ErlisLushtaku Apr 9, 2026

Uh oh!

geoalgo Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

geoalgo commented Apr 8, 2026

Uh oh!

ErlisLushtaku Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

geoalgo Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

ErlisLushtaku Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

ErlisLushtaku Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

geoalgo Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants