Question about sampling 256 samples for C4 dataset evaluation

I have a question: why do we sample 256 samples of length seqlen when evaluating the C4 dataset? This specifically refers to the get_c4 and get_c4_new functions in datautils.py.
The paper mentions that 128 samples are drawn from C4 as the calibration dataset, but I haven’t found explanations in the paper for why 256 samples are used during C4 dataset PPL evaluation.
Should we evaluate on the full C4 dataset instead? Could anyone familiar with this help explain? Thanks a lot!


<img width="946" height="810" alt="Image" src="https://github.com/user-attachments/assets/a793f7b0-7e38-4665-8aae-c4408b07630a" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about sampling 256 samples for C4 dataset evaluation #65

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about sampling 256 samples for C4 dataset evaluation #65

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions