[refactor](variant) normalize nested search predicate field resolution#61548
[refactor](variant) normalize nested search predicate field resolution#61548eldenmoon wants to merge 2 commits intoapache:masterfrom
Conversation
Issue Number: None Related PR: None Problem Summary: Nested search predicates were parsed inconsistently across code paths. Queries inside `NESTED(path, ...)` had to repeat the full nested prefix, unsupported nested forms were validated late, and normalized field bindings could diverge from the field paths pushed down to thrift. This change centralizes nested field path construction and normalizes child predicates against the active nested path during parsing. It applies the same validation rules in standard and lucene modes, rejects unsupported nested forms earlier, and keeps normalized field bindings aligned with generated thrift search params. The added FE tests cover standard mode, lucene mode, invalid nested syntax, and thrift serialization of normalized nested fields. Normalize FE handling of nested search predicates for Variant search DSL. Fields inside `NESTED(path, ...)` must now be written relative to the nested path, and unsupported forms such as absolute nested field references, bare queries, nested `NESTED(...)`, and non-top-level `NESTED` clauses now fail with explicit syntax errors. - Test: Not run in this session (message-only amend; the code change adds FE test coverage) - Behavior changed: Yes (nested predicates now require relative field references inside `NESTED(path, ...)`) - Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
Centralizes and normalizes nested field-path handling for the Variant search DSL so NESTED(path, ...) predicates are parsed consistently (standard + lucene), validated earlier, and keep FE field bindings aligned with the thrift pushdown representation.
Changes:
- Refactor FE parser to build field paths in one place and normalize nested predicate fields relative to the active
NESTEDpath (with earlier validation for unsupported nested forms). - Expand FE test coverage for standard/lucene nested parsing, invalid nested syntax, and thrift serialization of normalized nested field bindings.
- Remove a no-op loop in BE
variant_util.cpp.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| fe/fe-core/src/main/java/org/apache/doris/analysis/SearchDslParser.java | Adds shared field-path builder + nested-relative normalization and enforces top-level-only NESTED validation in both modes. |
| fe/fe-core/src/test/java/org/apache/doris/nereids/trees/expressions/functions/scalar/SearchDslParserTest.java | Updates existing nested tests to use relative field syntax and adds new standard/lucene + invalid-form coverage. |
| fe/fe-core/src/test/java/org/apache/doris/analysis/SearchPredicateTest.java | Adds a thrift-serialization test to verify nested-relative fields are normalized before pushdown. |
| be/src/exec/common/variant_util.cpp | Removes an empty/unused loop in path-stats checking logic. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| private static String buildFieldPath(SearchParser.FieldPathContext ctx) { | ||
| if (ctx == null) { | ||
| throw new RuntimeException("Invalid field query: missing field path"); |
There was a problem hiding this comment.
buildFieldPath() throws a generic RuntimeException for a missing field path. That ends up being wrapped as an "Unexpected error parsing search DSL" in parseDsl*, which is misleading for user-input syntax issues. Prefer throwing SearchDslSyntaxException here so callers consistently surface an "Invalid search DSL" / syntax error message instead of an internal/unexpected error.
| throw new RuntimeException("Invalid field query: missing field path"); | |
| throw new SearchDslSyntaxException("Invalid field query: missing field path"); |
| Assertions.assertEquals("data.items.msg", param.field_bindings.get(0).field_name); | ||
| Assertions.assertEquals("data", param.field_bindings.get(0).parent_field_name); | ||
| Assertions.assertEquals("items.msg", param.field_bindings.get(0).subcolumn_path); | ||
| Assertions.assertEquals("data.items.meta.channel", param.field_bindings.get(1).field_name); | ||
| Assertions.assertEquals("data", param.field_bindings.get(1).parent_field_name); | ||
| Assertions.assertEquals("items.meta.channel", param.field_bindings.get(1).subcolumn_path); |
There was a problem hiding this comment.
This test asserts specific ordering of param.field_bindings (indexes 0/1). Field bindings are derived from AST traversal/set insertion order, so the exact order can change with harmless refactors and make the test flaky. Consider asserting the bindings by field_name (e.g., lookup by name or assert the set of names/parent_field_name/subcolumn_path) rather than relying on list position.
| Assertions.assertEquals("data.items.msg", param.field_bindings.get(0).field_name); | |
| Assertions.assertEquals("data", param.field_bindings.get(0).parent_field_name); | |
| Assertions.assertEquals("items.msg", param.field_bindings.get(0).subcolumn_path); | |
| Assertions.assertEquals("data.items.meta.channel", param.field_bindings.get(1).field_name); | |
| Assertions.assertEquals("data", param.field_bindings.get(1).parent_field_name); | |
| Assertions.assertEquals("items.meta.channel", param.field_bindings.get(1).subcolumn_path); | |
| Map<String, TSearchFieldBinding> bindingsByFieldName = new HashMap<>(); | |
| for (TSearchFieldBinding binding : param.field_bindings) { | |
| bindingsByFieldName.put(binding.field_name, binding); | |
| } | |
| TSearchFieldBinding msgBinding = bindingsByFieldName.get("data.items.msg"); | |
| Assertions.assertNotNull(msgBinding); | |
| Assertions.assertEquals("data", msgBinding.parent_field_name); | |
| Assertions.assertEquals("items.msg", msgBinding.subcolumn_path); | |
| TSearchFieldBinding metaChannelBinding = bindingsByFieldName.get("data.items.meta.channel"); | |
| Assertions.assertNotNull(metaChannelBinding); | |
| Assertions.assertEquals("data", metaChannelBinding.parent_field_name); | |
| Assertions.assertEquals("items.meta.channel", metaChannelBinding.subcolumn_path); |
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 26818 ms |
TPC-DS: Total hot run time: 168222 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: QsLuceneModeAstBuilder's boolean-chain collector skipped atomClause.nestedQuery, so Lucene mode rejected valid top-level NESTED(...) queries with "Invalid atom clause: missing field or bare query" before nested path normalization and top-level validation could run.
### Release note
Fix SearchDslParser so Lucene mode accepts valid top-level NESTED predicates and still rejects non-top-level NESTED placement with the intended validation error.
### Check List (For Author)
- Test: FE unit test
- ./run-fe-ut.sh --run org.apache.doris.nereids.trees.expressions.functions.scalar.SearchDslParserTest
- Behavior changed: Yes (Lucene-mode Search DSL now parses valid top-level NESTED queries instead of failing early)
- Does this need documentation: No
|
run buildall |
|
/review |
Code Review SummaryPR: refactor normalize nested search predicate field resolution I've performed a thorough review of this PR against the code-review skill checklist. The changes centralize nested field path construction, normalize child predicates during parsing, apply consistent validation across standard and lucene modes, and add comprehensive tests. Critical Checkpoint ConclusionsGoal and correctness: The PR's goal is to make fields inside Modification scope: Focused and minimal. The SearchDslParser.java changes extract duplicated field-path-building code into Concurrency: Both Lifecycle / save-restore pattern: Parallel code paths: Both Error handling: All new error paths throw Behavioral change / compatibility: This is a behavioral change — previously, users had to write absolute paths inside NESTED (e.g., Test coverage: Comprehensive. The PR adds/updates tests covering:
BE dead code removal: The removed loop in Configuration / incompatible changes: No new config items. The behavioral change (relative vs absolute field paths in NESTED) is a breaking change for existing DSL queries using absolute paths inside NESTED. No issues found. The refactoring is clean, well-tested, and improves code organization by eliminating duplication and catching errors earlier. |
TPC-H: Total hot run time: 26634 ms |
TPC-DS: Total hot run time: 167710 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Problem Summary:
Nested search predicates were parsed inconsistently across code paths. Queries inside
NESTED(path, ...)had to repeat the full nested prefix, unsupported nested forms were validated late, and normalized field bindings could diverge from the field paths pushed down to thrift.This change centralizes nested field path construction and normalizes child predicates against the active nested path during parsing. It applies the same validation rules in standard and lucene modes, rejects unsupported nested forms earlier, and keeps normalized field bindings aligned with generated thrift search params. The added FE tests cover standard mode, lucene mode, invalid nested syntax, and thrift serialization of normalized nested fields.
Normalize FE handling of nested search predicates for Variant search DSL. Fields inside
NESTED(path, ...)must now be written relative to the nested path, and unsupported forms such as absolute nested field references, bare queries, nestedNESTED(...), and non-top-levelNESTEDclauses now fail with explicit syntax errors.NESTED(path, ...))