Skip to content

[SPARK-XXXXX][SQL] Guard against unresolved attributes in constraint inference#54921

Open
mihailotim-db wants to merge 1 commit intoapache:masterfrom
mihailotim-db:fix_null_constraint
Open

[SPARK-XXXXX][SQL] Guard against unresolved attributes in constraint inference#54921
mihailotim-db wants to merge 1 commit intoapache:masterfrom
mihailotim-db:fix_null_constraint

Conversation

@mihailotim-db
Copy link
Contributor

@mihailotim-db mihailotim-db commented Mar 20, 2026

What changes were proposed in this pull request?

Guard against UnresolvedAttribute in ConstraintHelper.constructIsNotNullConstraints by checking Attribute.resolved before accessing Attribute.nullable.

The change is a one-line fix in QueryPlanConstraints.scala:

Before:

val nonNullableAttributes = output.filterNot(_.nullable)

After:

val nonNullableAttributes = output.filter(a => a.resolved && !a.nullable)

Why are the changes needed?

constructIsNotNullConstraints is called from the constraints lazy val on LogicalPlan, which is evaluated lazily during optimization (e.g., by InferFiltersFromConstraints). If a plan node's output contains an UnresolvedAttribute, calling .nullable on it throws UnresolvedException because UnresolvedAttribute.nullable is defined as:

override def nullable: Boolean = throw new UnresolvedException("nullable")

An UnresolvedAttribute can appear in a plan's output when Alias.toAttribute is called on an unresolved Alias:

// namedExpressions.scala
override def toAttribute: Attribute = {
  if (resolved) {
    AttributeReference(name, child.dataType, child.nullable, metadata)(exprId, qualifier)
  } else {
    UnresolvedAttribute.quoted(name)  // <-- this leaks into Project.output
  }
}

Since Project.output is computed as projectList.map(_.toAttribute), any unresolved Alias in the project list produces an UnresolvedAttribute in the output. This can occur when plans are constructed through non-standard paths (e.g., Spark Connect plan deserialization) where the plan may not be fully analyzed before optimizer rules access the constraints lazy val.

The stack trace:

UnresolvedException: [INTERNAL_ERROR] Invalid call to nullable on an unresolved named expression
  at UnresolvedAttribute.nullable(unresolved.scala:330)
  at ConstraintHelper.constructIsNotNullConstraints(QueryPlanConstraints.scala:103)
  at QueryPlanConstraints.constraints(QueryPlanConstraints.scala:36)
  at InferFiltersFromConstraints.inferFilters(Optimizer.scala:...)

The fix is safe because unresolved attributes have unknown nullability, so skipping them for IsNotNull constraint inference is semantically correct — we simply cannot infer anything about attributes whose nullability is unknown.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Two new tests were added:

  1. ConstraintPropagationSuite — unit test that directly validates constructIsNotNullConstraints handles a mix of resolved and unresolved attributes in the output without crashing, and correctly infers IsNotNull only for the resolved non-nullable attributes.
  2. InferFiltersFromConstraintsSuite — optimizer-level regression test that constructs a Filter on top of a Project with an unresolved Alias (simulating the production plan topology) and applies InferFiltersFromConstraints directly. Verified that the test fails without the fix with the exact production stack trace and passes with the fix.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-6)

@dongjoon-hyun dongjoon-hyun marked this pull request as draft March 20, 2026 13:34
@mihailotim-db mihailotim-db changed the title fix [SPARK-XXXXX][SQL] Guard against unresolved attributes in constraint inference Mar 20, 2026
@mihailotim-db mihailotim-db marked this pull request as ready for review March 20, 2026 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant