A115: disable priority LB child policy retention cache#541
A115: disable priority LB child policy retention cache#541apolcyn wants to merge 9 commits intogrpc:masterfrom
Conversation
markdroth
left a comment
There was a problem hiding this comment.
Thanks for writing this up!
Please let me know if you have any questions.
|
|
||
| ## Abstract | ||
|
|
||
| [A56](https://github.com/grpc/proposal/blob/master/A56-priority-lb-policy.md) describes |
There was a problem hiding this comment.
This statement is misleading. A56 specifies that the caching mechanism will be used in two cases:
- When a higher priority child becomes reachable, we deactive the lower-priority children and eventually remove them.
- When a child is removed from the LB policy config.
This gRFC is removing case (2) only. We did not discuss removing case (1), and I don't think it makes sense to do so.
There was a problem hiding this comment.
Removing the caching only for case (2) is not going to as simple as setting an integer (at least in Go), as discussed today.
To give a little bit more context, we use a balancergroup, which is a helper that manages a bunch of LB policies and one can add and remove policies from the balancergroup and it allows the caller to control aggregating state updates from these policies. Our caching is implemented in the balancergroup. And this balancergroup is used by a bunch of LB policies like priority, rls, weightedtarget and clustermanager.
We could add a flavor of the Remove method that supports caching and one that doesn't. But I'm wondering how the other languages implement this caching and how easy/hard it is going to be to support this only for case (2).
There was a problem hiding this comment.
I don't think we want to disable caching for case (1). If Go needs more work to do that, then we need to do that work.
FWIW, in C-core, we don't have any shared code for the policies you listed. The priority policy deactivates the child here for case 1 and here for case 2. So all we would need to do to remove this functionality for case 2 is to change the latter to immediately remove the child instead of deactivating it.
I can understand why Go might have used shared code for the priority, weighted_target, and xds_cluster_manager LB policies, since they all have this caching behavior. But I will note that the RLS policy was never supposed to have this caching behavior, so it seems strange to me that you'd also use it there.
|
|
||
| ## Abstract | ||
|
|
||
| [A56](https://github.com/grpc/proposal/blob/master/A56-priority-lb-policy.md) describes |
There was a problem hiding this comment.
I don't think we want to disable caching for case (1). If Go needs more work to do that, then we need to do that work.
FWIW, in C-core, we don't have any shared code for the policies you listed. The priority policy deactivates the child here for case 1 and here for case 2. So all we would need to do to remove this functionality for case 2 is to change the latter to immediately remove the child instead of deactivating it.
I can understand why Go might have used shared code for the priority, weighted_target, and xds_cluster_manager LB policies, since they all have this caching behavior. But I will note that the RLS policy was never supposed to have this caching behavior, so it seems strange to me that you'd also use it there.
| [[AA, BB, CC], [DD]] => [priority-0-3 priority-0-4] | ||
| [[BB, CC], [AA, DD]] => [priority-0-0 priority-0-1] | ||
| [[AA, BB, CC], [DD, EE]] => [priority-0-1 priority-0-2] | ||
| [[BB, CC], [AA, DD]] => [priority-0-2 priority-0-3] |
There was a problem hiding this comment.
I think the algorithm should be sticking with child numbers 1 and 2 in this last step.
I've sent you grpc/grpc#41896 to add tests demonstrating the algorithm's behavior in C-core.
There was a problem hiding this comment.
hmm the original text is what Go was doing (i tagged you on an internal doc to illustrate)
but maybe that was specific to Go
Changed to your original suggestion
There was a problem hiding this comment.
In the case you have now in the gRFC, in the third update, the algorithm will use child number 3 for [AA, BB] and reuse child number 2 for [CC, DD], because it was already using that number for [CC] in the second update.
If what you're seeing in Go matches what you currently have in the doc, then I suspect Go's implementation of this algorithm isn't quite right. @easwars
There was a problem hiding this comment.
Will take a look and get back asap.
ejona86
left a comment
There was a problem hiding this comment.
Do we (all of us) have a plan for cluster_manager? Are we going to make a separate gRFC for that?
I don't think we need any changes there at this point. Arguably, the caching there is a gray area, because I don't think any gRFC ever specified that we would actually have caching there, but all of our implementations actually do. But on the other hand, I don't think the caching in that policy is actually causing any problems right now. So my inclination is to wait until we have a problem before changing that. |
| [[AA, BB, CC], [DD]] => [priority-0-3 priority-0-4] | ||
| [[BB, CC], [AA, DD]] => [priority-0-0 priority-0-1] | ||
| [[AA, BB, CC], [DD, EE]] => [priority-0-1 priority-0-2] | ||
| [[BB, CC], [AA, DD]] => [priority-0-2 priority-0-3] |
There was a problem hiding this comment.
In the case you have now in the gRFC, in the third update, the algorithm will use child number 3 for [AA, BB] and reuse child number 2 for [CC, DD], because it was already using that number for [CC] in the second update.
If what you're seeing in Go matches what you currently have in the doc, then I suspect Go's implementation of this algorithm isn't quite right. @easwars
| Note this should be done for Java and Go only. | ||
|
|
||
| For C-core, we are actually potentially getting benefit from this | ||
| behavior due to subchannel sharing, so we're not planning to drop |
There was a problem hiding this comment.
s/sharing/pooling/, just to be precise.
Update on https://github.com/grpc/proposal/blob/master/A56-priority-lb-policy.md#child-lifetime-management