[NoSQL] Ensure index elements with non-`null` values by snazy · Pull Request #4018 · apache/polaris

snazy · 2026-03-18T10:08:58Z

Within the current implementation, it is possible that the iterators of the Index API return elements with null values. This may cause NPEs.

This change enforces that index elements returned by the Index API iterators can never yield null values. The actual fix is in IndexSpi using the new iterator base class. The only place where null values could "escape" from the index API was in the .(reverse)iterator() functions in IndexSpi. The iterator implementation has been updated to skip elements with null values.

While the mentione change alone would have been sufficient, it seemed better to make the null-safety guarantees visible at the API level.

Chosing Map.Entry as the type for index elements seemed to be logical at the time of writing. As both Map.Entry.getKey() and Map.Entry.getValue() are nullable, it makes it difficult to "see" potential non-nullable violations in the code base. To address this issue, the Map.Entry interface has been replaced in the API with a new Index.Element interface with @Nonnull annotated functions. The implementation internal and package-private interface IndexElement was renamed to InternalIndexElement, now implementing Index.Element.

Background information: When an "embedded" serialized index becomes too big, it is spilled out to separately stored index stripes. The embedded index then tracks the changes on top of the spilled out index stripes. Such a change can be an update (a different non-null value) or a remove. In case of a removal, the value becomes null. Such removed elements must be visible within the index implementation, but never be exposed via the API.

Within the current implementation, it is possible that the iterators of the `Index` API return elements with `null` values. This may cause NPEs. This change enforces that index elements returned by the `Index` API iterators can never yield `null` values. The actual fix is in `IndexSpi` using the new iterator base class. The only place where `null` values could "escape" from the index API was in the `.(reverse)iterator()` functions in `IndexSpi`. The iterator implementation has been updated to skip elements with `null` values. While the mentione change alone would have been sufficient, it seemed better to make the null-safety guarantees visible at the API level. Chosing `Map.Entry` as the type for index elements seemed to be logical at the time of writing. As both `Map.Entry.getKey()` and `Map.Entry.getValue()` are nullable, it makes it difficult to "see" potential non-nullable violations in the code base. To address this issue, the `Map.Entry` interface has been replaced in the API with a new `Index.Element` interface with `@Nonnull` annotated functions. The implementation internal and package-private interface `IndexElement` was renamed to `InternalIndexElement`, now implementing `Index.Element`. Background information: When an "embedded" serialized index becomes too big, it is spilled out to separately stored index stripes. The embedded index then tracks the changes on top of the spilled out index stripes. Such a change can be an update (a different non-null value) or a remove. In case of a removal, the value becomes `null`. Such removed elements must be visible within the index implementation, but never be exposed via the API.

dimas-b · 2026-03-18T16:09:26Z

+      return key().equals(other.key()) && Objects.equals(valueNullable(), other.valueNullable());
+    }
+    if (o instanceof Index.Element<?> other) {
+      return key().equals(other.key()) && Objects.equals(valueNullable(), other.value());


If the comparison is reversed and other happens to be an IndexElem, then it will use value() from this AbstractIndexElement... so it looks like comparison is not commutative 🤔

Do we actually need polymorphic comparison for these classes?

Why not consider different sub-types as "not equal" all the time?

The comparison is commutative. Yes, the "intentionally broken contract" by InternalIndexElement makes it a bit difficult to reason about.

Internally, the indexes implementation works only with instances of InternalIndexElement, aka AbstractIndexElement. You cannot put() an *Element instance to the index.

With this, call sites can get instances of the internal representation, but as Index.Element and never with a null value.

So from a call-site the comparison is commutative.

Considering this test code:

@Test void compareWithIndexElem() { IndexKey key = IndexKey.key("test"); DirectIndexElement<Object> d = new DirectIndexElement<>(key, null); Index.Element<String> e = Index.Element.of(key, "test"); assertThat(d.equals(e)).isFalse(); assertThat(e.equals(d)).isFalse(); }

The first assertion passes, but the second assertion throws a NullPointerException. This is what I mean by non-commutative comparison.

It may not be happening in runtime now, but in general, I tend to think, it would be preferable to avoid this kind of code pattern 🤔

The test is crafted to that cannot succeed. And it's using an internal only type that is not exposed to any Index call site.

I played with this PR a bit (full Polaris Sever + Spark client) and I did not find any calls paths leading to comparison of IndexElem. Do we need these .equals() methods at all?

I'm fine merging "as is". Opened #4049 for follow-up.

dimas-b · 2026-03-18T16:17:17Z

    var u = embedded.getElement(key);
    if (u != null) {
-      return u.getValue() != null;
+      return u.valueNullable() != null;


nit: maybe add u.hasValue() and avoid nullable getters?

Hm, I'd like to avoid any unnecessary calls with in this performance sensitive part.

Is u.hasValue() less performant than u.valueNullable() != null?

My point is that if we have .hasValue() perhaps valueNullable() will not be necessary at all? WDYT?

It's an implementation internal type - so 🤷

…laris/persistence/nosql/impl/indexes/InternalIndexElement.java Co-authored-by: Alexandre Dutra <adutra@apache.org>

dimas-b · 2026-03-23T16:59:06Z

+ */
+record IndexElem<V>(IndexKey key, V value) implements Index.Element<V> {
+
+  IndexElem(@Nonnull IndexKey key, @Nonnull V value) {


I did not see any uses of this record in runtime (create/insert/select in Spark). Do we need this as a top-level class?

How about we keep the value nullable in the concrete class and non-null in the Index.Element.value() interface method?

Internal use cases can deal with the impl. class., external - with the interface.

I hope the code can be simplified this way. WDYT?

dimas-b · 2026-03-24T14:28:23Z

+      return key().equals(other.key()) && Objects.equals(valueNullable(), other.valueNullable());
+    }
+    if (o instanceof Index.Element<?> other) {
+      return key().equals(other.key()) && Objects.equals(valueNullable(), other.value());


I'm fine merging "as is". Opened #4049 for follow-up.

snazy requested a review from dimas-b March 18, 2026 10:08

github-project-automation Bot added this to Basic Kanban Board Mar 18, 2026

github-project-automation Bot moved this to PRs In Progress in Basic Kanban Board Mar 18, 2026

snazy marked this pull request as draft March 18, 2026 10:43

snazy force-pushed the nosql-idx-null-safe branch from 2da3604 to 8565b48 Compare March 18, 2026 10:47

snazy marked this pull request as ready for review March 18, 2026 10:47

adutra reviewed Mar 18, 2026

View reviewed changes

dimas-b reviewed Mar 18, 2026

View reviewed changes

snazy and others added 4 commits March 19, 2026 18:57

Update persistence/nosql/persistence/impl/src/main/java/org/apache/po…

f9f2964

…laris/persistence/nosql/impl/indexes/InternalIndexElement.java Co-authored-by: Alexandre Dutra <adutra@apache.org>

record + adapt hashCode()

31e02b6

iter

b320e9b

spotless + comment

f4546cc

snazy requested review from adutra and dimas-b March 19, 2026 18:15

dimas-b reviewed Mar 23, 2026

View reviewed changes

dimas-b mentioned this pull request Mar 24, 2026

Asymmetric equals methods in Index.Element implementations #4049

Closed

dimas-b approved these changes Mar 24, 2026

View reviewed changes

github-project-automation Bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Mar 24, 2026

snazy merged commit 83200d5 into apache:main Mar 25, 2026
18 checks passed

github-project-automation Bot moved this from Ready to merge to Done in Basic Kanban Board Mar 25, 2026

snazy deleted the nosql-idx-null-safe branch March 25, 2026 11:08

Conversation

snazy commented Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimas-b Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dimas-b Mar 23, 2026 •

edited

Loading