CASSANDRA-20588 Fix single token batch atomicity with Accord/non-Accord batches#4684
CASSANDRA-20588 Fix single token batch atomicity with Accord/non-Accord batches#4684alanwang67 wants to merge 10 commits intoapache:trunkfrom
Conversation
| // We special case single token batch statements that span both Accord and non-Accord | ||
| // tables to go through the batch log in order to preserve all or nothing application | ||
| // see CASSANDRA-20588 for more details | ||
| boolean isSingleTokenBatchStatementSpanningAccordAndNonAccordTables = (mutations.size() == 1) && (containsAccordMutation(ClusterMetadata.current(), mutations.get(0))); |
There was a problem hiding this comment.
I think this is being done in the wrong file. Update StorageProxy.mutateWithConditions to split the mutations and then use the result of the split to decide what is happening. A helper function can iterate the split mutation and look for the pattern we want to specially route to the batch log path. You can pass the already split mutations into the atomic and non-atomic mutation methods to re-use that work.
The pattern to look for is split across Accord and non-Accord with more than one table present. You don't even need to have a set for tracking table UUID just remember the first one you see and if you see a different one in the other set of tables then you know it needs to go down the batch log route.
I don't love asking for this because it's gong to be messy, but every mutation has to pass through here so avoiding repeating map lookups and constructing filtered mutations seems worth it.
There was a problem hiding this comment.
I'm assuming you mean StorageProxy.mutateWithTriggers, as StorageProxy.mutateWithConditions does not exist. So currently, there are two methods that we branch to depending on whether or not we go through the batch log path: mutateAtomically and dispatchMutationsWithRetryOnDifferentSystem. These two methods use different SplitConsumer's on the list of Accord Mutations and the list of Normal Mutations after constructing them. This means that in order to reuse the values that we created, we will need to perform the creation of the SplitConsumer after we have determined which path we will go down. Doing so is a bit awkward since the two SplitConsumer's don't return the same type. Therefore, we would need to perform this as a side effect. However, this is a bit messy since SplitConsumer captures some state that is only later created by mutateAtomically and also this means we would have to allocate this state regardless of which branch we went down which seems a bit wasteful.
Some other alternatives I've considered are just applying the SplitConsumer lazily after we have passed in the two values. However, this means that we will have to reiterate over the list of Mutations which doesn't seem too great to do since we will have to pay this price for every Batch and Modification Statement.
However from your above comments, I changed the check to do a single pass and not allocate the filtered mutations and also inlined it so we only pay this overhead in the batch statement for the single mutation case.
Also glad to take any suggestions if there's a better way to do things or if the tradeoff is still worthwhile to do.
test/distributed/org/apache/cassandra/distributed/test/accord/AccordSingleTokenBatchTest.java
Outdated
Show resolved
Hide resolved
test/distributed/org/apache/cassandra/distributed/test/accord/AccordSingleTokenBatchTest.java
Outdated
Show resolved
Hide resolved
test/distributed/org/apache/cassandra/distributed/test/accord/AccordSingleTokenBatchTest.java
Outdated
Show resolved
Hide resolved
aweisberg
left a comment
There was a problem hiding this comment.
Basically +1 just use the tests transactional mode not hard coded full.
Use batch log for single token batches that span both Accord and non Accord tables to preserve all or nothing application.
patch by Alan Wang;
reviewed by for CASSANDRA-20588
Co-authored-by: Name1
Co-authored-by: Name2