[CALCITE-7393] Support RelDataTypeDigest#4761
[CALCITE-7393] Support RelDataTypeDigest#4761zhuwenzhuang wants to merge 2 commits intoapache:mainfrom
Conversation
604cf22 to
04538b2
Compare
|
Thank you for your contribution! I have a few suggestions: |
| /** | ||
| * Digest of a RelDataType. | ||
| */ | ||
| public interface RelDataTypeDigest { |
There was a problem hiding this comment.
This could be two interfaces, in case Digest is useful somewhere else.
There was a problem hiding this comment.
I noticed that the hashCode caching in RelDigest and RexCall already addresses some of the hashCode caching/comparison latency issues. Could you please clarify what you mean by “be two interfaces”?
There was a problem hiding this comment.
interface HasDigestString {
String getDigestString()
}
interface RelDataTypeDigest extends HasDigestString {
RelDataType getType();
}
Other classes may implement HasDigestString.
| } | ||
|
|
||
| @Override public boolean equals(@Nullable Object obj) { | ||
| @Override public boolean deepEquals(Object obj) { |
There was a problem hiding this comment.
I think you'll need to leave the old implementations around too, they may break users who expect them to exist.
There was a problem hiding this comment.
If we allow user to implement equals/hashCode for nested type, cache for hashCode and pointer compare for deepEquals will not work for them.
This situation is similiar to https://issues.apache.org/jira/browse/CALCITE-3786
|
|
||
| protected final @Nullable List<RelDataTypeField> fieldList; | ||
| protected @Nullable String digest; | ||
| protected RelDataTypeDigest digest; |
There was a problem hiding this comment.
Changing protected fields is a breaking change, which has to be reflected in the semantic versioning. This would require a 2.0 release.
There was a problem hiding this comment.
On the other hand, I don't know what else you can do to make the digest lazy. Maybe other people can weigh on this dilemma. This is why you should probably file a JIRA issue just for this topic and discuss the design there.
There was a problem hiding this comment.
Maybe we can allow user to set String digest to define their own digest. otherwise, we use the RelDataTypeDigest
There was a problem hiding this comment.
My suggestion is that some protected/public variables and methods must still be guaranteed to work properly. For example, this
"String digest" might eventually call
"toString" from
"RelDataTypeDigest" to get the value? Because these variables are exposed to users, and someone might currently be using them, we need to provide a compatibility fix. If you think reassigning the original variable would go against your design intention (e.g., it might increase memory usage instead of reducing it), then you need to declare that this is a breaking change.
04538b2 to
a03c48a
Compare
1f17a42 to
955f537
Compare
955f537 to
f33c6a2
Compare
ac7bf74 to
078f2e4
Compare
|
I don't know how to review this; the change is reasonable in itself, but I don't know if this is an acceptable breaking change. I would appreciate comments from people who have dealt with similar breaking changes in the past. |
xiedeyantu
left a comment
There was a problem hiding this comment.
I think this change has a fairly wide impact. I'm wondering if it's possible to control the new/old logic by introducing configuration options. To be honest, I can't judge at this point how the current changes will affect future compatibility. Although your changes may be good, the community probably values compatibility more.
078f2e4 to
7416868
Compare
|
@xiedeyantu |
Use structured innerDigest instead of string digest for composite/UDT types to reduce memory and improve hashCode/equals latency. Controlled by `calcite.disable.generate.type.digest.string` (default: false). Legacy string digest is still used in hashCode/equals if explicitly set, ensuring backward compatibility. TestCase: TypeDigestBenchmark
7416868 to
38fa020
Compare
|
In general you should avoid force-pushing until a review is finished, to make the reviewer's job easier by making it clear what is new. |
mihaibudiu
left a comment
There was a problem hiding this comment.
This looks pretty good. Let's see if anyone else has comments.
|
|
||
| protected final @Nullable List<RelDataTypeField> fieldList; | ||
| protected @Nullable String digest; | ||
| protected @Deprecated @Nullable String digest; |
There was a problem hiding this comment.
Please add also a @deprecated JavaDoc comment explaining what users are supposed to do instead.
There was a problem hiding this comment.
Should we suggest using RelDataTypeDigest.getDigest().getDigestString() instead of digest? I also have a question: if a user extends and uses this field, will the current modification block the upgrade? Is this considered a blocking point?
There was a problem hiding this comment.
1.Yes. It has better performace and ondemand construction behavior.
2.Not a blocking point, if user set the String digest not null, all hashCode/equals/getXXXString logic will fallback to behaviors based on String digest.
There was a problem hiding this comment.
My understanding is no. If you use this field and mark it with @Deprecated, the compiler will definitely throw an error. You are returning a string using RelDataTypeDigest.getDigest().getDigestString(), instead of directly using digest.
There was a problem hiding this comment.
My first meaning is to add a description like "Use RelDataTypeDigest.getDigest().getDigestString() instead of digest" to the Javadoc.
There was a problem hiding this comment.
OK, @depredate mark may cause user's compiler thorw an error. And I think use RelDataTypeDigest to handle digest is better than String digest. I don't know should we add the @depredate mark to indicate this suggestion.
For the java doc.
"Use RelDataTypeDigest.getDigest().getDigestString() instead of digest" is one usage of digest. If users managed the String digest by not standard way(maybe UDT)
digest = "MyType";, they need implement this in
void generateTypeString(StringBuilder sb,boolean withDetail);I'm not sure whether we need to add this to the Javadoc, either.
There was a problem hiding this comment.
I believe a more detailed description would be better. While I think this breaks compatibility, I think it might be acceptable if no one objects in the next few days.
|
|
||
| /** Whether to disable generate type digest string. | ||
| * | ||
| * <p> Disable generate type digest string for every type can |
There was a problem hiding this comment.
Please be more explicit here: this is about the RelDataType.digest field.
"Type" can mean many things.
9ca8243 to
a38a2d2
Compare
a38a2d2 to
892c828
Compare
and add a deprecated comment
892c828 to
6ce4690
Compare
|
xiedeyantu
left a comment
There was a problem hiding this comment.
I don't know if any declarations are needed for the blocking point, but I think this optimization is very good.



Use RelDataTypeDigest to reduce composite type's digest memory and digest relative operation's latency.
test case: HepPlannerTest.testLargeTypeDigest
This is the first part of latency/memory optimization of large plan.