Summary
Enable slot derivations to reference previously computed (derived) values, not just source fields, and implement the existing hide: true field to suppress intermediate slots from output.
Motivation
When a transformation needs multiple mapping stages — e.g., map two coded fields through lookup tables, then combine the results — there's no clean way to do it today:
- Throwaway intermediate slots produce unwanted output fields
- Inline
case() expressions are unreadable for complex mappings
The hide field already exists on SlotDerivation in the spec but is not implemented. Combined with derived slot bindings, it enables a cascading pattern:
slot_derivations:
_visit_label:
hide: true
populated_from: visit_code
value_mappings:
"1": "SCREENING"
"7": "BASELINE"
_site_label:
hide: true
populated_from: site_code
value_mappings:
"A": "Boston"
"B": "Framingham"
visit_site:
expr: "slot('_visit_label') + ' at ' + slot('_site_label')"
Two pieces
1. Implement hide: true
Trivial — skip writing to tgt_attrs when slot_derivation.hide is True. Hidden slots still need to be computed.
2. Derived slot bindings via slot() function
Reference previously computed slot values using a slot() function in expressions, avoiding namespace collisions with source field names. Function names can't collide with data field names, unlike {_variable} syntax which could match a source column.
visit_site:
expr: "slot('_visit_label') + ' at ' + slot('_site_label')"
Design considerations
- Ordering: Derived bindings imply that slot derivation evaluation order matters. Today
slot_derivations is a dict (insertion-ordered in Python). This may need to be formalized.
- Reversibility: Chains of
value_mappings with hide are more naturally reversible than complex case() expressions. The inverter could potentially reverse these with simple hints.
- Complexity: Derived bindings are a significant change to expression evaluation. This should be driven by concrete use cases, not implemented speculatively.
Related
Summary
Enable slot derivations to reference previously computed (derived) values, not just source fields, and implement the existing
hide: truefield to suppress intermediate slots from output.Motivation
When a transformation needs multiple mapping stages — e.g., map two coded fields through lookup tables, then combine the results — there's no clean way to do it today:
case()expressions are unreadable for complex mappingsThe
hidefield already exists onSlotDerivationin the spec but is not implemented. Combined with derived slot bindings, it enables a cascading pattern:Two pieces
1. Implement
hide: trueTrivial — skip writing to
tgt_attrswhenslot_derivation.hide is True. Hidden slots still need to be computed.2. Derived slot bindings via
slot()functionReference previously computed slot values using a
slot()function in expressions, avoiding namespace collisions with source field names. Function names can't collide with data field names, unlike{_variable}syntax which could match a source column.Design considerations
slot_derivationsis a dict (insertion-ordered in Python). This may need to be formalized.value_mappingswithhideare more naturally reversible than complexcase()expressions. The inverter could potentially reverse these with simple hints.Related
value_mappingswithexprvalues (simpler, independent, should land first)expression_to_expression_mappingsdiscussion