Skip to content

feat: Reworked MutateRows to use the data client#1290

Open
gkevinzheng wants to merge 1 commit intov3_stagingfrom
mutate-rows
Open

feat: Reworked MutateRows to use the data client#1290
gkevinzheng wants to merge 1 commit intov3_stagingfrom
mutate-rows

Conversation

@gkevinzheng
Copy link
Contributor

Changes Made:

  • Rerouted MutateRows to use the data client's Table.bulk_mutate_rows function.
  • Separated bulk_mutate_rows into a function that constructs the _MutateRowsOperation object and the part that starts the operation. This is so we could inspect the operation's errors for custom error handling.
  • Reworked error handling in Table.mutate_rows. Since we must return a list of Status objects, we process the list of errors obtained from the _MutateRowsOperation back into a corresponding list of error statuses.
  • Reworked retries in Table.mutate_rows as follows:
    • retry.deadline -> operation_timeout
    • timeout/mutation_timeout -> attempt_timeout
    • A null retry deadline now defaults to the table default mutate rows operation timeout instead of being defined as no timeout. The docstring has been updated to reflect that. A retry deadline of 0.0 will still be a no-retry option, and is implemented by setting the list of retriable errors to the empty list, and the operation timeout to the default mutate rows operation timeout.
  • A timestamp of None in set_cell, which is the default, will now generate a client-side timestamp representing the current time, rather than defaulting to using the server-side timestamp. The docstrings for this have been updated to reflect this.

@gkevinzheng gkevinzheng requested review from a team as code owners February 5, 2026 22:15
@product-auto-label product-auto-label bot added the size: xl Pull request size is extra large. label Feb 5, 2026
@product-auto-label product-auto-label bot added the api: bigtable Issues related to the googleapis/python-bigtable API. label Feb 5, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @gkevinzheng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant refactoring of the Table.mutate_rows method, transitioning its implementation to leverage the data client's bulk mutation capabilities. This change not only streamlines the mutation process but also brings substantial improvements to error handling and retry logic, providing more precise control and predictable behavior. Additionally, the default timestamp generation for set_cell has been clarified to ensure client-side timestamp usage when no explicit timestamp is provided.

Highlights

  • Rerouted MutateRows: The Table.mutate_rows method has been refactored to utilize the data client's Table.bulk_mutate_rows function, aligning with modern client architecture.
  • Separated Operation Construction: The bulk_mutate_rows functionality is now split into two parts: one for constructing the _MutateRowsOperation object and another for initiating the operation. This separation allows for more granular inspection of operation errors.
  • Reworked Error Handling: Error handling in Table.mutate_rows has been enhanced. It now processes exceptions from _MutateRowsOperation into a corresponding list of google.rpc.status_pb2.Status objects, ensuring a consistent return type for all mutations.
  • Updated Retry Logic: The retry mechanism for Table.mutate_rows has been updated. retry.deadline is now mapped to operation_timeout, and timeout/mutation_timeout are mapped to attempt_timeout. A None retry deadline now defaults to the table's default mutate rows operation timeout, while a 0.0 deadline explicitly disables retries.
  • Client-Side Timestamp for set_cell: When timestamp is None in set_cell, it will now generate a client-side timestamp representing the current time, rather than defaulting to a server-side timestamp. Docstrings have been updated to reflect this change.
Changelog
  • google/cloud/bigtable/data/_async/client.py
    • Introduced _get_mutate_rows_operation to encapsulate the creation of the bulk mutate operation object.
    • Modified bulk_mutate_rows to utilize the new _get_mutate_rows_operation method.
  • google/cloud/bigtable/data/_sync_autogen/client.py
    • Added _get_mutate_rows_operation for synchronous bulk mutate operation object creation.
    • Updated bulk_mutate_rows to call _get_mutate_rows_operation.
  • google/cloud/bigtable/row.py
    • Updated docstrings for _set_cell and set_cell to clarify that timestamp=None now defaults to the current system time (client-side).
    • Adjusted _set_cell logic to generate a client-side timestamp when timestamp is None or _SERVER_SIDE_TIMESTAMP.
  • google/cloud/bigtable/table.py
    • Removed timeout and RetryError imports, added GoogleAPICallError, MutationsExceptionGroup, RowMutationEntry, TABLE_DEFAULT, code_pb2, and status_pb2 imports.
    • Removed _retriable_internal_server_error import.
    • Completely re-implemented mutate_rows to use the data client's _get_mutate_rows_operation.
    • Implemented new error handling within mutate_rows to catch MutationsExceptionGroup and translate individual errors into status_pb2.Status objects.
    • Revised retry logic: retry.deadline is now operation_timeout, timeout/mutation_timeout is attempt_timeout.
    • Changed None retry deadline to default to TABLE_DEFAULT.MUTATE_ROWS.
    • Implemented 0.0 retry deadline by setting retryable_errors to an empty list.
    • Removed the _RetryableMutateRowsWorker class and associated helper functions (_compile_mutation_entries, _check_row_table_name, _check_row_type).
  • tests/system/v2_client/test_data_api.py
    • Removed the _add_test_error_handler utility function.
    • Updated test_table_mutate_rows_retries_timeout to reflect the new retry mechanism, specifically testing the 0.0 deadline behavior.
    • Modified test_table_mutate_rows_input_errors to expect ValueError for invalid mutation counts (zero or too many) instead of InvalidArgument or TooManyMutationsError.
  • tests/unit/v2_client/test_row.py
    • Added constants _TIME_NS_RETURN_VALUE and _EXPECTED_DEFAULT_TIMESTAMP_MICROS for mocking time.
    • Updated _set_cell_helper to use mock.patch("time.time_ns") for consistent timestamp generation in tests.
    • Added test_direct_row_set_cell_with_server_side_timestamp to verify handling of _SERVER_SIDE_TIMESTAMP.
    • Removed unused google.rpc imports.
  • tests/unit/v2_client/test_table.py
    • Removed extensive unit tests for the deprecated _compile_mutation_entries, _check_row_table_name, _check_row_type, and _RetryableMutateRowsWorker components.
    • Refactored _table_mutate_rows_helper to mock the new _get_mutate_rows_operation and verify the new error processing logic.
    • Introduced _MockRow helper class for testing.
    • Updated test cases for mutate_rows to assert against the new operation_timeout, attempt_timeout, and retryable_errors parameters passed to the underlying operation.
Activity
  • No specific activity (comments, reviews, progress updates) was provided in the context for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors MutateRows to use the new data client, which is a great improvement for consistency and maintainability. The separation of operation creation and execution, along with the updated error and retry handling, are well-implemented. I've added a couple of suggestions to improve code clarity in the new logic. Overall, this is a solid set of changes.

Comment on lines +164 to +165
if timestamp is None or timestamp == mutations._SERVER_SIDE_TIMESTAMP:
timestamp_micros = timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for handling the timestamp argument is a bit condensed and could be made more explicit for better readability and maintainability. Expanding the if condition to handle the None and _SERVER_SIDE_TIMESTAMP cases separately would make the intent clearer.

Suggested change
if timestamp is None or timestamp == mutations._SERVER_SIDE_TIMESTAMP:
timestamp_micros = timestamp
if timestamp is None:
# Let SetCell handle client-side timestamp generation by passing None.
timestamp_micros = None
elif timestamp == mutations._SERVER_SIDE_TIMESTAMP:
# Use server-side timestamp.
timestamp_micros = mutations._SERVER_SIDE_TIMESTAMP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree with this, it's easier to skim. But I also think the current code is totally fine

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to improve readability would just be to add a comment:

if timestamp is None or timestamp == mutations._SERVER_SIDE_TIMESTAMP:
    # preserve special-case values (client-side or server-side timestamp)
    timestamp_micros = timestamp

Comment on lines +745 to +754
retryable_errors = RETRYABLE_MUTATION_ERRORS
operation_timeout = retry.deadline

# The data client cannot take in zero or null values for deadline, so we set it to
# the default if that is the case. It shouldn't affect the behavior of the retry
# if a 0.0 deadline is set.
if not retry.deadline:
operation_timeout = TABLE_DEFAULT.MUTATE_ROWS
if retry.deadline == 0.0:
retryable_errors = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code for handling retry.deadline can be refactored for better clarity. The current implementation with if not retry.deadline: handles both None and 0.0 cases, which can be a bit confusing. Explicitly checking for None and 0.0 would make the logic easier to follow and more maintainable.

Suggested change
retryable_errors = RETRYABLE_MUTATION_ERRORS
operation_timeout = retry.deadline
# The data client cannot take in zero or null values for deadline, so we set it to
# the default if that is the case. It shouldn't affect the behavior of the retry
# if a 0.0 deadline is set.
if not retry.deadline:
operation_timeout = TABLE_DEFAULT.MUTATE_ROWS
if retry.deadline == 0.0:
retryable_errors = []
retryable_errors = RETRYABLE_MUTATION_ERRORS
if retry.deadline is None:
# The data client cannot take in null values for deadline, so we set it to the default.
operation_timeout = TABLE_DEFAULT.MUTATE_ROWS
elif retry.deadline == 0.0:
# A deadline of 0.0 indicates a no-retry strategy.
# We set a default operation timeout and an empty list of retryable errors.
operation_timeout = TABLE_DEFAULT.MUTATE_ROWS
retryable_errors = []
else:
operation_timeout = retry.deadline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this looks a bit cleaner

I'd also suggest doing retry.deadline == 0, instead of the 0.0 comparison

Comment on lines +745 to +754
retryable_errors = RETRYABLE_MUTATION_ERRORS
operation_timeout = retry.deadline

# The data client cannot take in zero or null values for deadline, so we set it to
# the default if that is the case. It shouldn't affect the behavior of the retry
# if a 0.0 deadline is set.
if not retry.deadline:
operation_timeout = TABLE_DEFAULT.MUTATE_ROWS
if retry.deadline == 0.0:
retryable_errors = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this looks a bit cleaner

I'd also suggest doing retry.deadline == 0, instead of the 0.0 comparison

# and set the status of that row entry to that grpc status. if none of the
# errors for a given index have gRPC status codes, return an UNKNOWN status
# with the first error message of each index.
for idx, errors in operation.errors.items():
Copy link
Contributor

@daniel-sanche daniel-sanche Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't have to go into the operation object like this. Can't you do something like:

try:
    self._table_impl.mutate_rows(
            mutation_entries,
            operation_timeout=operation_timeout,
            attempt_timeout=attempt_timeout,
            retryable_errors=retryable_errors,
        )
except MutationsExceptionGroup as exc_group:
    for error: FailedMutationEntryError in exc_group.exceptions:
        return_statuses[error.index] = status_pb2.Status(
            code=error.__cause__.grpc_status_code.value[0],
            message=error.__cause__.message,
            details=error.__cause__.details,
        )
return_statuses

(It looks like it could be a bit more complicated, since MutationExceptionGroup doesn't guarantee all exceptions are FailedMutationEntryErrors. But looking at the only place this exception seems to be raised, maybe that comment is wrong? I think you can change it to guarantee all sub-seceptions are FailedMutationEntryErrors, which would make parsing here simple)

attempt_timeout,
retryable_exceptions=retryable_excs,
)
return operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of polluting this file like this.

I mentioned in another comment that we should be able to get the information we need by parsing the exception group, which means we wouldn't need to do this

But even if we ignore that thread, I'd say it could be cleaner to just copy these three lines building the operation directly in the shim, instead of adding this helper. That way we're polluting the shim codebase, instead of the data client

What do you think?

Comment on lines +164 to +165
if timestamp is None or timestamp == mutations._SERVER_SIDE_TIMESTAMP:
timestamp_micros = timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree with this, it's easier to skim. But I also think the current code is totally fine

Comment on lines +164 to +165
if timestamp is None or timestamp == mutations._SERVER_SIDE_TIMESTAMP:
timestamp_micros = timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to improve readability would just be to add a comment:

if timestamp is None or timestamp == mutations._SERVER_SIDE_TIMESTAMP:
    # preserve special-case values (client-side or server-side timestamp)
    timestamp_micros = timestamp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigtable Issues related to the googleapis/python-bigtable API. size: xl Pull request size is extra large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants