I’d like to ask whether the evaluation for vbench-long uses clip_length_mix.yaml for all dimensions?
When using clip_length_mix.yaml, it seems that
imaging_quality: 2.0
aesthetic_quality: 2.0
are actually not being used!
In the code:
length_values = [clip_lengths[dim] for dim in unique_dimensions if dim in clip_lengths]
Since in VBench_full_info.json, these two dimensions are grouped together with overall_consistency, running
python vbench2_beta_long/eval_long.py --videos_path xx --dimension aesthetic_quality
will always result in
length_values = [2.0, 10.0, 2.0],
so
max_length = max(length_values) if length_values else None
will always be 10.
Therefore, for the imaging_quality and aesthetic_quality splits, the evaluation will always use the 10.0 length setting.
I’m wondering if this is the intended behavior, or should it be adjusted manually?
I’d like to ask whether the evaluation for vbench-long uses clip_length_mix.yaml for all dimensions?
When using clip_length_mix.yaml, it seems that
imaging_quality: 2.0aesthetic_quality: 2.0are actually not being used!
In the code:
Since in VBench_full_info.json, these two dimensions are grouped together with overall_consistency, running
will always result in
length_values = [2.0, 10.0, 2.0],so
will always be
10.Therefore, for the imaging_quality and aesthetic_quality splits, the evaluation will always use the
10.0length setting.I’m wondering if this is the intended behavior, or should it be adjusted manually?