Restore post-cloning PUF QRF re-imputation for geographic tax variation

## Background

PR #516 originally designed the calibration pipeline so that PUF + QRF imputation would run *after* cloning and geography assignment, giving each geographic clone geographically-informed tax imputations (with `state_fips` as a QRF predictor). This was a key part of the vision: households cloned into different states would receive state-appropriate imputed tax values rather than sharing identical federal-return-derived imputations.

## What Changed

Commit 49a1f66 removed the `--puf-dataset` flag from the calibration pipeline, moving PUF cloning upstream to `extended_cps.py`. As a result, all ~436 clones of the same household now share identical PUF-imputed tax values regardless of their assigned geography.

## Why This Was Deferred

Restoring post-cloning PUF re-imputation was deliberately deferred for several reasons:

1. **Matrix builder precomputation**: The current matrix builder precomputes variable values per state and reuses them across congressional districts. Post-cloning re-imputation would make tax values vary per clone (not just per state), breaking this optimization pattern.
2. **`X*w ↔ sim.calculate().sum()` consistency**: Ensuring the calibration matrix matches simulation output is already a hard problem (see recent fixes for cross-state cache pollution, clone-to-CD collisions, and takeup draw alignment). Re-imputation per clone would add another dimension of potential mismatch.
3. **Runtime cost**: QRF training is expensive. Running it after cloning (on ~436× the original household count) would substantially increase pipeline runtime.
4. **Current targets don't require it**: The calibration weight solver already handles geographic distribution of tax-related aggregates. Identical per-clone values with different weights achieve the same calibration targets.

## What Restoration Would Require

- Re-introduce PUF dataset loading and QRF imputation into the post-cloning pipeline stage
- Update the matrix builder to handle clone-varying tax variable values (cannot precompute per-state)
- Ensure `X*w` consistency when imputed values differ across clones of the same source household
- Profile and optimize QRF training for the larger post-cloning dataset
- Add validation that geographic tax variation improves microsimulation accuracy (not just adds complexity)

## Acceptance Criteria

- [ ] PUF QRF re-imputation runs after cloning and geography assignment
- [ ] `state_fips` (or equivalent) is used as a QRF predictor so clones get state-appropriate tax values
- [ ] Matrix builder correctly handles clone-varying values
- [ ] `X @ w` matches `sim.calculate(var) * w` for all tax-related target variables
- [ ] Pipeline runtime remains tractable (document benchmarks)
- [ ] Calibration results are at least as good as the current identical-clone approach


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore post-cloning PUF QRF re-imputation for geographic tax variation #560

Background

What Changed

Why This Was Deferred

What Restoration Would Require

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Restore post-cloning PUF QRF re-imputation for geographic tax variation #560

Description

Background

What Changed

Why This Was Deferred

What Restoration Would Require

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions