feat: added client side metric instrumentation to basic rpcs#1188
feat: added client side metric instrumentation to basic rpcs#1188daniel-sanche wants to merge 212 commits intomainfrom
Conversation
Co-authored-by: Mattie Fu <mattiefu@google.com>
| for i in range(num_retryable): | ||
| attempt = handler.completed_attempts[i] | ||
| assert isinstance(attempt, CompletedAttemptMetric) | ||
| assert attempt.end_status.name == "ABORTED" |
There was a problem hiding this comment.
Not related to the metrics, but ABORTED shouldn't be retried for mutate row?
There was a problem hiding this comment.
It doesn't look like it, and it's not listed as a retryable error in the service config. Is it in Java? It would be easy to add here
| assert attempt.end_status.value[0] == 0 | ||
| assert attempt.backoff_before_attempt_ns == 0 | ||
| assert ( | ||
| attempt.gfe_latency_ns > 0 and attempt.gfe_latency_ns < attempt.duration_ns |
There was a problem hiding this comment.
how is gfe_latency_ns injected to the header? this should be a number instead of a range since we can set it in the header.
There was a problem hiding this comment.
I think this is testing against the true backend response here, not a mocked value
These are system tests, so I was trying to limit the amount of mocking used here, although some other tests do inject fake exceptions into the stream to test retry logic
| final_attempt = handler.completed_attempts[num_retryable] | ||
| assert isinstance(final_attempt, CompletedAttemptMetric) | ||
| assert final_attempt.end_status.name == "PERMISSION_DENIED" | ||
| assert final_attempt.gfe_latency_ns is None |
There was a problem hiding this comment.
I think as long as the request gets to the server, gfe_latency_ns should not be none. So this is probably related to the test setup ?
There was a problem hiding this comment.
It looks like this test is testing the case where we have multiple transient retryable errors, before encountering a terminal error. For the errors, I'm using a custom error_injector class, to control which errors are triggered in which sequence. So this test wouldn't ever be reaching the real backend
It's been a while since I looked at this, but I remember having issues fully controlling the backend errors in the way I needed for some of these tests, which is why I used the error_injector. There are other tests in this file that send unauthorized requests to trigger a real backend failure though.
This PR builds off of #1187 to add instrumentation to basic data client rpcs (check_and_mutate, read_modify_write, sample_row_keys, mutate_row)
Metrics are not currently being exported anywhere, just collected and dropped. A future PR will add a GCP exporter to the system