feat: Reworked MutateRows to use the data client#1290
feat: Reworked MutateRows to use the data client#1290gkevinzheng wants to merge 1 commit intov3_stagingfrom
Conversation
Summary of ChangesHello @gkevinzheng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant refactoring of the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively refactors MutateRows to use the new data client, which is a great improvement for consistency and maintainability. The separation of operation creation and execution, along with the updated error and retry handling, are well-implemented. I've added a couple of suggestions to improve code clarity in the new logic. Overall, this is a solid set of changes.
| if timestamp is None or timestamp == mutations._SERVER_SIDE_TIMESTAMP: | ||
| timestamp_micros = timestamp |
There was a problem hiding this comment.
The logic for handling the timestamp argument is a bit condensed and could be made more explicit for better readability and maintainability. Expanding the if condition to handle the None and _SERVER_SIDE_TIMESTAMP cases separately would make the intent clearer.
| if timestamp is None or timestamp == mutations._SERVER_SIDE_TIMESTAMP: | |
| timestamp_micros = timestamp | |
| if timestamp is None: | |
| # Let SetCell handle client-side timestamp generation by passing None. | |
| timestamp_micros = None | |
| elif timestamp == mutations._SERVER_SIDE_TIMESTAMP: | |
| # Use server-side timestamp. | |
| timestamp_micros = mutations._SERVER_SIDE_TIMESTAMP |
There was a problem hiding this comment.
I kind of agree with this, it's easier to skim. But I also think the current code is totally fine
There was a problem hiding this comment.
Another way to improve readability would just be to add a comment:
if timestamp is None or timestamp == mutations._SERVER_SIDE_TIMESTAMP:
# preserve special-case values (client-side or server-side timestamp)
timestamp_micros = timestamp
| retryable_errors = RETRYABLE_MUTATION_ERRORS | ||
| operation_timeout = retry.deadline | ||
|
|
||
| # The data client cannot take in zero or null values for deadline, so we set it to | ||
| # the default if that is the case. It shouldn't affect the behavior of the retry | ||
| # if a 0.0 deadline is set. | ||
| if not retry.deadline: | ||
| operation_timeout = TABLE_DEFAULT.MUTATE_ROWS | ||
| if retry.deadline == 0.0: | ||
| retryable_errors = [] |
There was a problem hiding this comment.
This block of code for handling retry.deadline can be refactored for better clarity. The current implementation with if not retry.deadline: handles both None and 0.0 cases, which can be a bit confusing. Explicitly checking for None and 0.0 would make the logic easier to follow and more maintainable.
| retryable_errors = RETRYABLE_MUTATION_ERRORS | |
| operation_timeout = retry.deadline | |
| # The data client cannot take in zero or null values for deadline, so we set it to | |
| # the default if that is the case. It shouldn't affect the behavior of the retry | |
| # if a 0.0 deadline is set. | |
| if not retry.deadline: | |
| operation_timeout = TABLE_DEFAULT.MUTATE_ROWS | |
| if retry.deadline == 0.0: | |
| retryable_errors = [] | |
| retryable_errors = RETRYABLE_MUTATION_ERRORS | |
| if retry.deadline is None: | |
| # The data client cannot take in null values for deadline, so we set it to the default. | |
| operation_timeout = TABLE_DEFAULT.MUTATE_ROWS | |
| elif retry.deadline == 0.0: | |
| # A deadline of 0.0 indicates a no-retry strategy. | |
| # We set a default operation timeout and an empty list of retryable errors. | |
| operation_timeout = TABLE_DEFAULT.MUTATE_ROWS | |
| retryable_errors = [] | |
| else: | |
| operation_timeout = retry.deadline |
There was a problem hiding this comment.
I agree this looks a bit cleaner
I'd also suggest doing retry.deadline == 0, instead of the 0.0 comparison
| retryable_errors = RETRYABLE_MUTATION_ERRORS | ||
| operation_timeout = retry.deadline | ||
|
|
||
| # The data client cannot take in zero or null values for deadline, so we set it to | ||
| # the default if that is the case. It shouldn't affect the behavior of the retry | ||
| # if a 0.0 deadline is set. | ||
| if not retry.deadline: | ||
| operation_timeout = TABLE_DEFAULT.MUTATE_ROWS | ||
| if retry.deadline == 0.0: | ||
| retryable_errors = [] |
There was a problem hiding this comment.
I agree this looks a bit cleaner
I'd also suggest doing retry.deadline == 0, instead of the 0.0 comparison
| # and set the status of that row entry to that grpc status. if none of the | ||
| # errors for a given index have gRPC status codes, return an UNKNOWN status | ||
| # with the first error message of each index. | ||
| for idx, errors in operation.errors.items(): |
There was a problem hiding this comment.
You shouldn't have to go into the operation object like this. Can't you do something like:
try:
self._table_impl.mutate_rows(
mutation_entries,
operation_timeout=operation_timeout,
attempt_timeout=attempt_timeout,
retryable_errors=retryable_errors,
)
except MutationsExceptionGroup as exc_group:
for error: FailedMutationEntryError in exc_group.exceptions:
return_statuses[error.index] = status_pb2.Status(
code=error.__cause__.grpc_status_code.value[0],
message=error.__cause__.message,
details=error.__cause__.details,
)
return_statuses
(It looks like it could be a bit more complicated, since MutationExceptionGroup doesn't guarantee all exceptions are FailedMutationEntryErrors. But looking at the only place this exception seems to be raised, maybe that comment is wrong? I think you can change it to guarantee all sub-seceptions are FailedMutationEntryErrors, which would make parsing here simple)
| attempt_timeout, | ||
| retryable_exceptions=retryable_excs, | ||
| ) | ||
| return operation |
There was a problem hiding this comment.
I'm not a big fan of polluting this file like this.
I mentioned in another comment that we should be able to get the information we need by parsing the exception group, which means we wouldn't need to do this
But even if we ignore that thread, I'd say it could be cleaner to just copy these three lines building the operation directly in the shim, instead of adding this helper. That way we're polluting the shim codebase, instead of the data client
What do you think?
| if timestamp is None or timestamp == mutations._SERVER_SIDE_TIMESTAMP: | ||
| timestamp_micros = timestamp |
There was a problem hiding this comment.
I kind of agree with this, it's easier to skim. But I also think the current code is totally fine
| if timestamp is None or timestamp == mutations._SERVER_SIDE_TIMESTAMP: | ||
| timestamp_micros = timestamp |
There was a problem hiding this comment.
Another way to improve readability would just be to add a comment:
if timestamp is None or timestamp == mutations._SERVER_SIDE_TIMESTAMP:
# preserve special-case values (client-side or server-side timestamp)
timestamp_micros = timestamp
Changes Made:
Table.bulk_mutate_rowsfunction.bulk_mutate_rowsinto a function that constructs the_MutateRowsOperationobject and the part that starts the operation. This is so we could inspect the operation's errors for custom error handling.Table.mutate_rows. Since we must return a list ofStatusobjects, we process the list of errors obtained from the_MutateRowsOperationback into a corresponding list of error statuses.Table.mutate_rowsas follows:retry.deadline->operation_timeouttimeout/mutation_timeout->attempt_timeouttimestampofNoneinset_cell, which is the default, will now generate a client-side timestamp representing the current time, rather than defaulting to using the server-side timestamp. The docstrings for this have been updated to reflect this.