[NoSQL] Ensure index elements with non-null values#4018
Conversation
Within the current implementation, it is possible that the iterators of the `Index` API return elements with `null` values. This may cause NPEs. This change enforces that index elements returned by the `Index` API iterators can never yield `null` values. The actual fix is in `IndexSpi` using the new iterator base class. The only place where `null` values could "escape" from the index API was in the `.(reverse)iterator()` functions in `IndexSpi`. The iterator implementation has been updated to skip elements with `null` values. While the mentione change alone would have been sufficient, it seemed better to make the null-safety guarantees visible at the API level. Chosing `Map.Entry` as the type for index elements seemed to be logical at the time of writing. As both `Map.Entry.getKey()` and `Map.Entry.getValue()` are nullable, it makes it difficult to "see" potential non-nullable violations in the code base. To address this issue, the `Map.Entry` interface has been replaced in the API with a new `Index.Element` interface with `@Nonnull` annotated functions. The implementation internal and package-private interface `IndexElement` was renamed to `InternalIndexElement`, now implementing `Index.Element`. Background information: When an "embedded" serialized index becomes too big, it is spilled out to separately stored index stripes. The embedded index then tracks the changes on top of the spilled out index stripes. Such a change can be an update (a different non-null value) or a remove. In case of a removal, the value becomes `null`. Such removed elements must be visible within the index implementation, but never be exposed via the API.
2da3604 to
8565b48
Compare
| return key().equals(other.key()) && Objects.equals(valueNullable(), other.valueNullable()); | ||
| } | ||
| if (o instanceof Index.Element<?> other) { | ||
| return key().equals(other.key()) && Objects.equals(valueNullable(), other.value()); |
There was a problem hiding this comment.
If the comparison is reversed and other happens to be an IndexElem, then it will use value() from this AbstractIndexElement... so it looks like comparison is not commutative 🤔
There was a problem hiding this comment.
Do we actually need polymorphic comparison for these classes?
Why not consider different sub-types as "not equal" all the time?
There was a problem hiding this comment.
The comparison is commutative. Yes, the "intentionally broken contract" by InternalIndexElement makes it a bit difficult to reason about.
Internally, the indexes implementation works only with instances of InternalIndexElement, aka AbstractIndexElement. You cannot put() an *Element instance to the index.
With this, call sites can get instances of the internal representation, but as Index.Element and never with a null value.
So from a call-site the comparison is commutative.
There was a problem hiding this comment.
Considering this test code:
@Test
void compareWithIndexElem() {
IndexKey key = IndexKey.key("test");
DirectIndexElement<Object> d = new DirectIndexElement<>(key, null);
Index.Element<String> e = Index.Element.of(key, "test");
assertThat(d.equals(e)).isFalse();
assertThat(e.equals(d)).isFalse();
}
The first assertion passes, but the second assertion throws a NullPointerException. This is what I mean by non-commutative comparison.
It may not be happening in runtime now, but in general, I tend to think, it would be preferable to avoid this kind of code pattern 🤔
There was a problem hiding this comment.
The test is crafted to that cannot succeed. And it's using an internal only type that is not exposed to any Index call site.
There was a problem hiding this comment.
I played with this PR a bit (full Polaris Sever + Spark client) and I did not find any calls paths leading to comparison of IndexElem. Do we need these .equals() methods at all?
There was a problem hiding this comment.
I'm fine merging "as is". Opened #4049 for follow-up.
| var u = embedded.getElement(key); | ||
| if (u != null) { | ||
| return u.getValue() != null; | ||
| return u.valueNullable() != null; |
There was a problem hiding this comment.
nit: maybe add u.hasValue() and avoid nullable getters?
There was a problem hiding this comment.
Hm, I'd like to avoid any unnecessary calls with in this performance sensitive part.
There was a problem hiding this comment.
Is u.hasValue() less performant than u.valueNullable() != null?
There was a problem hiding this comment.
My point is that if we have .hasValue() perhaps valueNullable() will not be necessary at all? WDYT?
There was a problem hiding this comment.
It's an implementation internal type - so 🤷
…laris/persistence/nosql/impl/indexes/InternalIndexElement.java Co-authored-by: Alexandre Dutra <adutra@apache.org>
| */ | ||
| record IndexElem<V>(IndexKey key, V value) implements Index.Element<V> { | ||
|
|
||
| IndexElem(@Nonnull IndexKey key, @Nonnull V value) { |
There was a problem hiding this comment.
I did not see any uses of this record in runtime (create/insert/select in Spark). Do we need this as a top-level class?
There was a problem hiding this comment.
How about we keep the value nullable in the concrete class and non-null in the Index.Element.value() interface method?
Internal use cases can deal with the impl. class., external - with the interface.
I hope the code can be simplified this way. WDYT?
| return key().equals(other.key()) && Objects.equals(valueNullable(), other.valueNullable()); | ||
| } | ||
| if (o instanceof Index.Element<?> other) { | ||
| return key().equals(other.key()) && Objects.equals(valueNullable(), other.value()); |
There was a problem hiding this comment.
I'm fine merging "as is". Opened #4049 for follow-up.
Within the current implementation, it is possible that the iterators of the
IndexAPI return elements withnullvalues. This may cause NPEs.This change enforces that index elements returned by the
IndexAPI iterators can never yieldnullvalues. The actual fix is inIndexSpiusing the new iterator base class. The only place wherenullvalues could "escape" from the index API was in the.(reverse)iterator()functions inIndexSpi. The iterator implementation has been updated to skip elements withnullvalues.While the mentione change alone would have been sufficient, it seemed better to make the null-safety guarantees visible at the API level.
Chosing
Map.Entryas the type for index elements seemed to be logical at the time of writing. As bothMap.Entry.getKey()andMap.Entry.getValue()are nullable, it makes it difficult to "see" potential non-nullable violations in the code base. To address this issue, theMap.Entryinterface has been replaced in the API with a newIndex.Elementinterface with@Nonnullannotated functions. The implementation internal and package-private interfaceIndexElementwas renamed toInternalIndexElement, now implementingIndex.Element.Background information: When an "embedded" serialized index becomes too big, it is spilled out to separately stored index stripes. The embedded index then tracks the changes on top of the spilled out index stripes. Such a change can be an update (a different non-null value) or a remove. In case of a removal, the value becomes
null. Such removed elements must be visible within the index implementation, but never be exposed via the API.