Skip to content

Expand evaluation harness for multi-document and metadata cases #75

@voidrot

Description

@voidrot

Summary

Expand the evaluation harness so compiler changes for multi-document output and metadata enrichment are measured before they become the default path.

Why this work is needed

The repo already has an evaluation harness, but it does not yet capture the new compiler behaviors planned under this epic.

Scope

  • Add evaluation cases for multi-document outputs.
  • Add evaluation cases for topic and metadata enrichment.
  • Keep the harness useful for comparing compiler changes.
  • Preserve compatibility with existing evaluation workflows.

Out of scope

  • Hybrid retrieval evaluation.
  • Frontend UX testing.

Acceptance criteria

  • The evaluation harness covers representative multi-document cases.
  • Metadata-enrichment behavior is represented in the dataset or checks.
  • Compiler changes in this stream can be regression-tested.

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions