[RUM-13793] 🐛 Skip potential sanitize updates on unaltered fields#4298
[RUM-13793] 🐛 Skip potential sanitize updates on unaltered fields#4298
Conversation
0a59001 to
c9c4d23
Compare
Bundles Sizes Evolution
🚀 CPU Performance
🧠 Memory Performance
|
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: f7ab50d | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback! |
c9c4d23 to
4874044
Compare
This reverts commit 4874044.
4874044 to
5f2356e
Compare
With the current limitModification implementation, if the modifier doesn't perform any update on serverRumEvent.resource.url and the url happens to be too long, then it will be reset to undefined. This is an "undefined behavior" that creates a discrepancy when the user of the SDK doesn't provide a beforeSend in the init config. This commit solves the discrepancy by not calling sanitize on fields whose value (is a string) and hasn't been altered by the modifier.
5f2356e to
18017a5
Compare
| // and when it provides one that doesn't modify the event.context. | ||
| // This fix only works for strings, for example when the event.context is untouched, | ||
| // it will still go through the sanitize process. It's acceptable given the unlikely eventuality. |
There was a problem hiding this comment.
❓ question: I don't get what you have in mind around event.context. Could you clarify?
There was a problem hiding this comment.
The hard part of this "skip sanitization unless the value has changed" is to compare values that are objects, here object[field] === value. For a RumEvent the only "object" field that is modifiable by the user is context, see
browser-sdk/packages/rum-core/src/domain/assembly.ts
Lines 27 to 29 in d8c5490
There was a problem hiding this comment.
oh ok, I think I get it now.
With the current implementation, If the field is an object, updating a value of this object won't trigger a sanitize but updating the object reference will 🤔
What about using deepEqual when the field is an object?
This way it is more "Sanitize if it has been modified".
Regardless of the approach, we should add tests on these behaviors as well.
There was a problem hiding this comment.
With the current implementation, If the field is an object, updating a value of this object won't trigger a sanitize but updating the object reference will 🤔
The object[field] === value will never be true here for fieldType = "object", because we end up comparing the original object field with the one from the deep clone.
What about using deepEqual when the field is an object?
This way it is more "Sanitize if it has been modified".
I am not sure it's worth to do the deepEqual check, let's review this together.
It's not free.
deepEqual can be an expensive operation:
- at runtime, both in time and memory, depending on the shape of the object; plus we would apply it on every event
- implementation wise: we introduce more complexity
I think the bundle size increase would be insignificant here.
What's the value?
If we don't do the deepEqual, the following awkward flow might happen:
beforeSend() in config > "cloneContext" is unsafe (larger than 220 Kib) > beforeSend made no update to cloneContext > Unexpected Modified RumEvent Context
This is a bad outcome because the context of the RumEvent has been modified implicitly.
I think it is mitigated:
- how likely is it to happen?
- the event will likely be discarded by the transport layer (as it would have been originally)
What do you think?
There was a problem hiding this comment.
The object[field] === value will never be true here for fieldType = "object", because we end up comparing the original object field with the one from the deep clone.
oh yeah, indeed.
deepEqual can be an expensive operation
agree on end user resource impact, implementation wise I may miss the complexity that you anticipate
the following awkward flow might happen
yeah, I think we can live without it.
OK for me with current implementation but:
- let's make it clear the this function does not sanitize modifiable field of types object, may be worth moving all the aded comments to the function description
- let's have a test on that behavior
Co-authored-by: Bastien Caudan <1331991+bcaudan@users.noreply.github.com>
Motivation
When a RumResourceEvent has a
resource.urlfield that is too long one of two things can happen:beforeSend is used: the
sanitize()function will returnundefinedand thelimitModificationfunction will set theresource.urlfield toundefined, effectively deleting it. This caused problem on web-ui (now fixed).beforeSend is not used: the
createBatch()function will discard the message that contains the RumResourceEvent, we're loosing data.Reflection behind the changes
What do we even fix?
Sending invalid data
The problem that surfaced to our users is now fixed, web-ui doesn't break when a RumResourceEvent has no
resource.url.The SDK shouldn't send invalid data to intake, this can be fixed in several ways:
resource.urlfield we can simply truncate it instead of setting it to undefined, then "how much do we truncate?", at lest enough for the message to intake to not be discardedundefinedby either""or{}Inconsistent behavior
Also note the SDK is behaving in an inconsistent manner:
beforeSend()in the RumInitConfiguration, events with unsafe fields (large URL for example) will be discarded by the transport layer. The event is discardedbeforeSend()that doesn't alter theevent, unsafe fields will be sanitized, which mean they will be either set to undefined (string too long), or truncated (the object is too big); The event or context is updated while the user never intended to do it.Summary
Then we have two problems to solve:
beforeSend(),We solve 1. by not erasing required fields.
We solve 2. by not applying the sanitize process to fields the user hasn't modified, this requires us to compare the original value
object[field]with the newvalueinsetNestedValue(). This is easy for string values, but the implementation might be more involved for objects and potentially slow.For this PR, I've have decided to fix 1 completely and 2 partially.
For string values, we don't go through the sanitize process if they didn't change.
For object values, we still go through it, at the risk of tempering the value if it's a large object (serialization wise), meaning the resulting object might be truncated. I think this is an acceptable outcome given the low probability of occurence (objects whose serialized version length is superior to 225 280 characters).
Changes
After second feedback round:
Changes history
After first feedback round:
When a field is erased by the sanitize function, just replace it with""or{}depending on thefieldTypeAdd a related unit testBefore feedback:
Save resource.url before beforeSend runs, so the original value can be restored if the user callback corrupts itTruncate resource.url to 32 KiB if it exceeds that limit (or was set to undefined by beforeSend), with a display.warn messageAdd two unit tests in assembly.spec.ts covering truncation both with and without a beforeSend callback.Test instructions
yarn test:unit --spec 'packages/rum-core/src/domain/limitModification.spec.ts'Checklist