Enable fast WOSAC evals on large datasets with resampling by daphne-cornelisse · Pull Request #280 · Emerge-Lab/PufferDrive

daphne-cornelisse · 2026-02-06T01:08:20Z

Description

Problem

When evaluating WOSAC on large datasets, the current implementation loads all target scenes into memory simultaneously, leading to excessive memory consumption and potential out-of-memory errors.

Solution

This PR refactors the evaluation pipeline to iterate through datasets incrementally, loading and processing scenes one at a time rather than all at once.

Usage

Configure the sampling parameters like this:

[eval]
; Number of scenarios to process in each batch
wosac_batch_size = 10
; Target number of unique scenarios perform evaluation in
wosac_target_scenarios = 50
; Total number of scenarios to sample from
wosac_scenario_pool_size = 1000

Run:

puffer eval puffer_drive --eval.wosac-realism-eval True

>>> Running WOSAC realism evaluation with training dataset. 

Collecting rollout 14/32...████████████████████████████                                                | 40/100 [00:10<00:16,  3.73%/s, n=20, batch=4]

...

Implentation

Refactor resample scenes logic into a separate function
Use the util function in training and eval code
Simplify sampling logic: num_maps -> num_agents mapping is done under the hood in WOSAC eval so that users can think in scene units (easier for WOSAC eval purposes)

…keep puffer code clean).

WaelDLZ

Nice !

…#280) * Improve random map resampling code. * Refactor env: Create separate resampling function. * Works with wosac_use_map_as_resampling_target = False. * More elegant and clean solution to map resampling. * Clean up. * Put batch iteration in the WOSACEvaluator class to avoid repetition (keep puffer code clean). * Improve naming. * Clean up code and drop duplicates. * Fix util functions so that we can run wosac during training. * Log more metrics to wandb. * Remove unused variable map_idex. * Fix human replay eval. * Drop last scenario from batch as a safety measure.

daphne-cornelisse added 3 commits February 5, 2026 19:02

Improve random map resampling code.

ea49a64

Refactor env: Create separate resampling function.

b936083

Works with wosac_use_map_as_resampling_target = False.

45dd56e

daphne-cornelisse requested a review from WaelDLZ February 6, 2026 01:08

More elegant and clean solution to map resampling.

fcff9fb

daphne-cornelisse marked this pull request as ready for review February 6, 2026 16:08

daphne-cornelisse requested a review from eugenevinitsky February 6, 2026 16:09

This comment was marked as resolved.

Sign in to view

Clean up.

7391f9d

Emerge-Lab deleted a comment from greptile-apps Bot Feb 6, 2026

daphne-cornelisse changed the title ~~Enable fast WOSAC evals on large datasets with resampling.~~ Enable fast WOSAC evals on large datasets with resampling Feb 6, 2026

daphne-cornelisse added 8 commits February 6, 2026 11:28

Put batch iteration in the WOSACEvaluator class to avoid repetition (…

14d646e

…keep puffer code clean).

Improve naming.

7b8259c

Clean up code and drop duplicates.

0e8cb42

Fix util functions so that we can run wosac during training.

958fde9

Log more metrics to wandb.

cf077d6

Remove unused variable map_idex.

d3646ac

Fix human replay eval.

23d0209

Drop last scenario from batch as a safety measure.

6ddfef3

daphne-cornelisse added benchmarking labels Feb 6, 2026

WaelDLZ approved these changes Feb 6, 2026

View reviewed changes

daphne-cornelisse merged commit ec149b8 into 2.0 Feb 6, 2026
14 checks passed

daphne-cornelisse deleted the dc/wosac_eval_with_resampling branch February 6, 2026 22:44

riccardosavorgnan mentioned this pull request Mar 3, 2026

ricky/merge conflicts 3.0 beta and 3.0 #325

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable fast WOSAC evals on large datasets with resampling#280

Enable fast WOSAC evals on large datasets with resampling#280
daphne-cornelisse merged 13 commits into2.0from
dc/wosac_eval_with_resampling

daphne-cornelisse commented Feb 6, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

WaelDLZ left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daphne-cornelisse commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Usage

Implentation

Uh oh!

This comment was marked as resolved.

Uh oh!

WaelDLZ left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daphne-cornelisse commented Feb 6, 2026 •

edited

Loading