Classify TaskCancelledException as client error instead of system error#5273
Draft
ahkcs wants to merge 4 commits intoopensearch-project:mainfrom
Draft
Classify TaskCancelledException as client error instead of system error#5273ahkcs wants to merge 4 commits intoopensearch-project:mainfrom
ahkcs wants to merge 4 commits intoopensearch-project:mainfrom
Conversation
When a user cancels a PPL query, the TaskCancelledException is wrapped by Calcite (RuntimeException → SQLException → TaskCancelledException) and reaches the REST error handler as a generic exception, resulting in a 500 status code and incrementing the system error metric. Since query cancellation is user-initiated, it should be classified as a client error (400) to avoid inflating system failure metrics. Signed-off-by: Kai Huang <ahkcs@amazon.com>
Contributor
|
Failed to generate code suggestions for PR |
TaskCancelledException extends OpenSearchException, so ErrorMessageFactory.unwrapCause() was finding it and creating an OpenSearchErrorMessage with exception.status() (500), overriding the 400 status passed from the REST handler. Handle TaskCancelledException before the OpenSearchException check to use the passed-in status for both the HTTP response and JSON body. Signed-off-by: Kai Huang <ahkcs@amazon.com>
Contributor
|
Failed to generate code suggestions for PR |
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Contributor
|
Failed to generate code suggestions for PR |
ErrorMessage.fetchReason() returns "Invalid Query" for all 400 status codes. Now that TaskCancelledException returns 400, the reason should be "Query cancelled" instead of "Invalid Query". Signed-off-by: Kai Huang <ahkcs@amazon.com>
Contributor
|
Failed to generate code suggestions for PR |
Collaborator
Author
|
Note: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a user cancels a PPL query via the
_tasks/_cancelAPI, theTaskCancelledExceptionis wrapped by Calcite (RuntimeException → SQLException → TaskCancelledException) and reaches the REST error handler as a generic exception. This results in:PPL_FAILED_REQ_COUNT_SYS(system error metric) instead ofPPL_FAILED_REQ_COUNT_CUS(client error metric)Since query cancellation is user-initiated, it should not be classified as a system error.
Changes
hasCauseOf()helper to walk the exception cause chainTaskCancelledExceptiontoisClientError()via cause chain checkTaskCancelledExceptionRelated
Test plan
TaskCancelledExceptionclassified as client errorTaskCancelledExceptionclassified as client errorRuntimeExceptionstill classified as system errorOSD side log: