Skip to content

perf(common): filter pushthrough optimization for decode param filter#2035

Open
fordN wants to merge 2 commits intomainfrom
ford/optimizations/decode-param-filter-pushthrough
Open

perf(common): filter pushthrough optimization for decode param filter#2035
fordN wants to merge 2 commits intomainfrom
ford/optimizations/decode-param-filter-pushthrough

Conversation

@fordN
Copy link
Copy Markdown
Contributor

@fordN fordN commented Mar 27, 2026

This PR adds an optimization step that automatically rewrites filters on evm_decode output fields into equivalent filters on raw topic columns when the field corresponds to an indexed event parameter. The derived topic filter participates in all existing pushdown optimizations (statistics pruning, bloom filters, row-level pushdown) before any decoding occurs.

Background

When filtering on a resulting output param from decoding logs, the query engine needs to first decode logs and then apply the filter. In some cases the filtering can actually be done ahead of decoding by applying an understanding of how indexed parameters are decoded. For example, if there is an indexed address parameter in the first slot decoding that parameter means simply removing the padding from the topic1 parameter. A discerning analyst with this knowledge can optimize their query by filtering first on the topic1 parameter before decoding.

Original query:

select
    pc._block_num,
    pc.address as token_address,
    pc.dec['from'] as from_address            
from (
    select
        l._block_num,
        l.address,
        evm_decode(l.topic1, l.topic2, l.topic3, l.data, 'Transfer(address indexed from, address indexed to, uint256 value)') as dec
    from "test/eth_mainnet@1.0.0".logs l
    where
        l.topic0 = evm_topic('Transfer(address indexed from, address indexed to, uint256 value)') and
        l.topic3 IS NULL and l._block_num > 21004434) pc
where 
    pc.dec['from'] = X'70CF99553471FE6C0D513EBFAC8ACC55BA02AB7B'

Optimized query:

select
    pc._block_num,
    pc.address as token_address,
    pc.dec['from'] as from_address            
from (
    select
        l._block_num,
        l.address,
        evm_decode(l.topic1, l.topic2, l.topic3, l.data, 'Transfer(address indexed from, address indexed to, uint256 value)') as dec
    from "test/eth_mainnet@1.0.0".logs l
    where
        l.topic1 = X'00000000000000000000000070CF99553471FE6C0D513EBFAC8ACC55BA02AB7B`
        l.topic0 = evm_topic('Transfer(address indexed from, address indexed to, uint256 value)') and
        l.topic3 IS NULL and l._block_num > 21004434) pc

Note the added topic1 predicate in the optimized query which filters the data for the specified from wallet address before decoding the logs.

This optimization can be applied automatically, so users don't need to have a deep understanding of how blockchain data is encoded to optimize their decode queries. This PR adds that optimization step, so the original query and the unoptimized query above end up with the same query execution plan!

Changes

  • New filter_pushthrough_decode module — logical plan rewrite + 16 tests (filter_pushthrough_decode.rs)
  • Added topic_names() and find_indexed_param() to Event (evm_common.rs)
  • Exposed evm_common as pub(crate) so the optimizer can access Event (udfs.rs)
  • Integrated rewrite in SessionState::optimize before DF's optimizer + debug_span! for tracing (session_state.rs)
  • Supports equality (=) and IN list on indexed value types (address, bool, uint<N>, int<N>, bytes<N>)
  • Correctly skips reference types (string, bytes, arrays) where the topic is a keccak256 hash
  • Original decoded-field filter always retained for correctness; derived filter is additive (AND)
  • Handles both inline evm_decode(...) calls and column references through subquery projections

Example

For Transfer(address indexed from, address indexed to, uint256 value):

WHERE dec['from'] = X'70cf99...'

The optimizer derives and injects:

AND topic1 = X'00000000000000000000000070cf99...'

Plan shape before

Filter [dec['from'] = X'70cf99...']
  Projection [evm_decode(...) as dec]
    Filter [topic0 = X'ddf252...']
      DataSourceExec

Plan shape after

Filter [dec['from'] = X'70cf99...']
  Projection [evm_decode(...) as dec]
    Filter [topic0 = X'ddf252...' AND topic1 = X'000...70cf99...']
      DataSourceExec

Local benchmark: Transfer events filtered by from address

Query scanning ~2.39M Transfer event rows across 3 row groups, filtering on dec['from'] for a specific address (97 matching rows).

Metric Before After Change
bytes_scanned 69.43 MB 45.62 MB -34%
predicate_cache_records 7.61M 4.36M -43%
predicate_cache_inner_records 13.04M 9.55M -27%
row_pushdown_eval_time 597.87ms 127.75ms -79%
time_elapsed_scanning_total 3.19s 1.62s -49%
time_elapsed_processing 3.22s 1.67s -48%
output_rows 97 97 identical

The largest win is row pushdown evaluation time dropping 79% — the derived topic1 filter eliminates rows before they reach the expensive evm_decode path. Bloom filter pruning now also participates via required_guarantees=[..., topic1 in (...)].

These gains are on a small dataset (3 row groups). For queries scanning hundreds of row groups, bloom filter pruning will skip entire row groups, giving even more dramatic improvements.

@fordN fordN force-pushed the ford/optimizations/decode-param-filter-pushthrough branch from 4e8e924 to 88970a7 Compare March 27, 2026 18:05
fordN added 2 commits March 27, 2026 11:06
Automatically rewrites filters on evm_decode output fields into
equivalent filters on raw topic columns when the field corresponds to
an indexed event parameter. The derived topic filter is pushed down to
the parquet scan level, enabling statistics pruning, bloom filters, and
row-level pushdown before any decoding occurs.
@fordN fordN force-pushed the ford/optimizations/decode-param-filter-pushthrough branch from 88970a7 to 940c620 Compare March 27, 2026 18:06
@fordN fordN self-assigned this Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant