Expand evaluation harness for multi-document and metadata cases

## Summary
Expand the evaluation harness so compiler changes for multi-document output and metadata enrichment are measured before they become the default path.

## Why this work is needed
The repo already has an evaluation harness, but it does not yet capture the new compiler behaviors planned under this epic.

## Scope
- Add evaluation cases for multi-document outputs.
- Add evaluation cases for topic and metadata enrichment.
- Keep the harness useful for comparing compiler changes.
- Preserve compatibility with existing evaluation workflows.

## Out of scope
- Hybrid retrieval evaluation.
- Frontend UX testing.

## Acceptance criteria
- The evaluation harness covers representative multi-document cases.
- Metadata-enrichment behavior is represented in the dataset or checks.
- Compiler changes in this stream can be regression-tested.

## Dependencies
- Depends on #62.
- Depends on #76.
- Depends on #77.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand evaluation harness for multi-document and metadata cases #75

Summary

Why this work is needed

Scope

Out of scope

Acceptance criteria

Dependencies

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Expand evaluation harness for multi-document and metadata cases #75

Description

Summary

Why this work is needed

Scope

Out of scope

Acceptance criteria

Dependencies

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions