Skip to content

Initial implementation of report-builder interface#5266

Draft
Swiddis wants to merge 14 commits intoopensearch-project:mainfrom
Swiddis:feat/report-builder
Draft

Initial implementation of report-builder interface#5266
Swiddis wants to merge 14 commits intoopensearch-project:mainfrom
Swiddis:feat/report-builder

Conversation

@Swiddis
Copy link
Collaborator

@Swiddis Swiddis commented Mar 25, 2026

Description

Well, this is turning out quite long.

The fundamental idea of this PR is to add a lot more context to error messages. For example, take this bad query:

source = big5
| eval range_bucket = case(
   `metrics.size` < -10, 'range_1',
   `metrics.size` >= -10 and `metrics.size` < 10, 'range_2',
   `metrics.size` >= 10 and `metrics.size` < 100, 'range_3',
   `metrics.size` >= 100 and `metrics.size` < 1000, 'range_4',
   `metrics.size` >= 1000 and `metrics.size` < 2000, 'range_5',
   `metrics.size` >= 2000, 'range_6')
| stats count() by range_bucket, span(`@timestamp`, 1h) as auto_span
| sort + range_bucket, + auto_spa

Now there's more context, provided at various layers:

{
  "error": {
    "context": {
      "stage": "analyzing",
      "stage_description": "Validating the query"
    },
    "details": "Field [auto_spa] not found.",
    "location": [
      "checking if fields and indices exist"
    ],
    "code": "FIELD_NOT_FOUND",
    "type": "IllegalArgumentException"
  },
  "status": 400
}

A lot of this PR in terms of line count is just patching specific places where tests are asserting errors behave in specific ways. The core of the PR is ErrorReport and StageErrorHandler which adds utilities for incrementally building detailed errors, QueryService where we hook up the outer shell of the stage handlers, and RestPPLQueryAction where we handle and present the errors.

The enhancements aren't super complete in this PR as the emphasis is scaffolding (and the integration work + fixing tests is already quite significant), for the most part the changes are backwards compatible in normal cases. The type, details, and status fields are in the same spots and by default pull from the underlying exception.

Related Issues

#5261, #4919

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

@Swiddis Swiddis added the enhancement New feature or request label Mar 25, 2026
@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@Swiddis Swiddis force-pushed the feat/report-builder branch from c1c2355 to 23c5fbb Compare March 25, 2026 23:45
@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

Swiddis added 3 commits March 26, 2026 12:04
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

ahkcs added a commit to ahkcs/sql that referenced this pull request Mar 26, 2026
NPE in the analytics path is a server bug (null schema field, missing
row), not bad user input. Removed from client error list. Will sync
this classification with RestPPLQueryAction updates in opensearch-project#5266.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 26, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit d25006e.

PathLineSeverityDescription
integ-test/src/test/java/org/opensearch/sql/datasource/DataSourceEnabledIT.java153lowDebug logging statements (`System.err.println`) printing full HTTP response bodies were added to a test helper method. While in a test file, this could inadvertently log sensitive response payloads (tokens, credentials) during CI runs where stderr is captured and stored. Appears to be accidental debug code left uncommitted rather than malicious intent.
core/src/main/java/org/opensearch/sql/calcite/utils/CalciteToolsHelper.java437lowThe `enrichErrorsForSpecialCases` method extracts and places the raw Calcite query plan text (including SQL AST fragments) into the `context` map under key `plan`, which is subsequently serialized directly into the client-facing API error response via `ErrorReport.toJsonMap()` → `ErrorMessage.getErrorAsJson()`. This leaks internal query plan structure to external API consumers. The intent appears to be improving debuggability, not malicious exfiltration, but represents unintended information disclosure.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

@Swiddis
Copy link
Collaborator Author

Swiddis commented Mar 26, 2026

The error enrichment logic extracts the raw SQL query plan string from internal exception messages and...

In context, this is pulling details out of a potentially very fuzzy Calcite error. The error logic without any of this ultimately returns the plan with a message resembling "Failed to prepare plan: [blob]." The plan is provided as context as the innermost error might be weird and only apply to one node, e.g. CalciteEnumerableNestedAggregate having a problem means nothing without access to the plan containing said aggregate. In OSD this plan probably won't be presented to users directly.

Any details about our internal implementation can be easily scrutinized by navigating to https://github.com/opensearch-project/sql in a recent version of Netscape Navigator.

ahkcs added a commit that referenced this pull request Mar 26, 2026
…5267)

* [Mustang] Add query routing and execution handoff for Parquet-backed indices (#5247)

Implements the query routing and AnalyticsExecutionEngine for Project
Mustang's unified query pipeline. PPL queries targeting parquet_ prefixed
indices are routed through UnifiedQueryPlanner and executed via a stub
QueryPlanExecutor, with results formatted through the existing JDBC
response pipeline.

New files:
- QueryPlanExecutor: @FunctionalInterface contract for analytics engine
- AnalyticsExecutionEngine: converts Iterable<Object[]> to QueryResponse
  with type mapping and query size limit enforcement
- RestUnifiedQueryAction: orchestrates schema building, planning,
  execution on sql-worker thread pool, with client/server error
  classification and metrics
- StubQueryPlanExecutor: canned data for parquet_logs and parquet_metrics
  tables for development and testing

Modified files:
- RestPPLQueryAction: routing branch for parquet_ indices
- SQLPlugin: passes ClusterService and NodeClient to RestPPLQueryAction
- plugin/build.gradle: adds :api dependency for UnifiedQueryPlanner

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Align isClientError with RestPPLQueryAction classification

Add missing exception types to isClientError(): IndexNotFoundException,
ExpressionEvaluationException, QueryEngineException,
DataSourceClientException, IllegalAccessException. Matches the full
list in RestPPLQueryAction.isClientError().

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Move stub code into analytics/stub sub-package

Extract StubSchemaProvider, StubQueryPlanExecutor, and StubIndexDetector
into plugin/.../rest/analytics/stub/ package to clearly separate
temporary stub code from production code. RestUnifiedQueryAction now
delegates to these stub classes.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Use ErrorMessageFactory for error responses

Replace hand-crafted JSON error response with
ErrorMessageFactory.createErrorMessage(), matching the standard error
format used in RestPPLQueryAction.reportError().

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Make metrics and response formatting query-type-aware

Use QueryType to select the correct metrics (PPL_REQ_TOTAL vs REQ_TOTAL,
PPL_FAILED_REQ_COUNT_* vs FAILED_REQ_COUNT_*) and LangSpec (PPL_SPEC
vs SQL_SPEC) so this class can serve both PPL and SQL queries when
unified SQL support is added.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Move analytics routing from REST layer to transport layer

Move the analytics index routing check from RestPPLQueryAction into
TransportPPLQueryAction.doExecute(). This ensures the analytics path
gets the same PPL enabled check, metrics, request ID, and inter-plugin
transport support as the existing Lucene path. RestPPLQueryAction and
SQLPlugin are reverted to their original state.

Added executeViaTransport() to RestUnifiedQueryAction which returns
results via ActionListener<TransportPPLQueryResponse> instead of
RestChannel, integrating properly with the transport action pattern.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Add query size limit to RelNode plan instead of post-processing

Add LogicalSystemLimit to the RelNode plan before passing it to the
analytics engine, consistent with PPL V3 (QueryService.convertToCalcitePlan).
This ensures the analytics engine enforces the limit during execution
rather than returning all rows for post-processing truncation.

Remove post-processing querySizeLimit truncation from
AnalyticsExecutionEngine -- the limit is now part of the plan the
executor receives.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Propagate security context to sql-worker thread pool

Wrap scheduled lambdas in both execute() and executeViaTransport() with
withCurrentContext() to capture and restore ThreadContext (user identity,
permissions, audit trail) on the worker thread. Follows the same pattern
as OpenSearchQueryManager.withCurrentContext().

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Move analytics routing after profiling setup

Move the analytics index routing check after QueryContext.setProfile()
and wrapWithProfilingClear(listener). Use clearingListener instead of
raw listener so profiling thread-local state is properly cleaned up
after analytics path execution.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Remove NPE from isClientError classification

NPE in the analytics path is a server bug (null schema field, missing
row), not bad user input. Removed from client error list. Will sync
this classification with RestPPLQueryAction updates in #5266.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Remove duplicate REST execution path from RestUnifiedQueryAction

Remove execute(query, queryType, channel), doExecute(), createQueryListener(channel),
recordSuccessMetric(), recordFailureMetric(), reportError(), and related
REST imports. Since routing now goes through TransportPPLQueryAction,
the REST-specific path was unused. Renamed executeViaTransport() to
execute() as the sole entry point.

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <kaihuang@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Contributor

Failed to generate code suggestions for PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant