[SPARK-56089][SQL] Align asinh/acosh with fdlibm algorithm for cross-engine compatibility by xiaoxuandev · Pull Request #54912 · apache/spark

xiaoxuandev · 2026-03-19T23:41:09Z

What changes were proposed in this pull request?

Replace the naive log(x + sqrt(x^2 ± 1)) formula in Asinh and Acosh with the full fdlibm algorithm (s_asinh.c / e_acosh.c), as implemented in OpenJDK FdLibm.java.

For Acosh (5 branches):

x < 1: NaN
x >= 2^28: log(x) + log(2)
x == 1: 0.0
2 < x < 2^28: log(2x - 1/(x + sqrt(x*x - 1)))
1 < x < 2: log1p(t + sqrt(2*t + t*t))

For Asinh (4 branches):

|x| < 2^-28: x (identity)
|x| > 2^28: sign(x) * (log(|x|) + log(2))
|x| > 2: sign(x) * log(2|x| + 1/(|x| + sqrt(x*x + 1)))
otherwise: sign(x) * log1p(|x| + x*x/(1 + sqrt(1 + x*x)))

Why are the changes needed?

The fdlibm algorithm (s_asinh.c / e_acosh.c) is the de facto standard used by OpenJDK, glibc, musl, Go, Python, PostgreSQL, and DuckDB. Spark's previous naive formula log(x + sqrt(x^2 ± 1)) produces results that differ by 1-2 ULP from these platforms for certain input ranges, causing cross-engine inconsistencies.

The fdlibm algorithm also avoids catastrophic cancellation near x=1 (acosh) and x=0 (asinh) by using range-specific formulas (log1p, identity).

This also changes the large-value threshold from sqrt(Double.MaxValue) to 2^28, matching the fdlibm standard.

Does this PR introduce any user-facing change?

Results may differ by 1-2 ULP for inputs in the (1, 2] range (acosh) and [-2, 2] range (asinh) due to the more precise formulas. The new results are closer to the mathematically exact values and consistent with other platforms.

How was this patch tested?

Updated tests in MathExpressionsSuite with fdlibm-aligned reference functions, plus new tests covering all branches with hardcoded reference values cross-verified against C libm.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with Kiro.

peter-toth · 2026-03-20T16:24:32Z

sql/core/src/test/resources/sql-tests/results/postgreSQL/float8.sql.out

 struct<ASINH(1):double>
 -- !query output
-0.8813735870195429
+0.881373587019543


Checked with PostgreSQL.

peter-toth · 2026-03-20T16:25:53Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala

+    } else if (ax > 2.0) {
+      StrictMath.log(2.0 * ax + 1.0 / (math.sqrt(x * x + 1.0) + ax))
+    } else {
+      StrictMath.log1p(ax + x * x / (1.0 + math.sqrt(1.0 + x * x)))


val t = x - 1.0?

? here in fdmlib t = x * x

oh, sorry, yes x * x

Thank you both, fixed!

peter-toth

Looks good to me, I have only a nit.

peter-toth · 2026-03-20T16:28:06Z

cc @sarutak, @mgaido91

mgaido91

just a couple of question and one suggestion

mgaido91 · 2026-03-20T18:36:15Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala

+    // fdlibm s_asinh.c algorithm
+    val ax = Math.abs(x)
+    val w = if (ax.isInfinite || ax.isNaN) {
+      ax


in fbmlib they have x + x here IIUC. I do not see the rationale though.

In fdlibm, x + x for the inf/NaN case serves two purposes: (1) it propagates signaling NaN signals in C, and (2) it preserves the sign of infinity (since asinh is an odd function).
In our implementation, we take Math.abs(x) which strips the sign, but Math.copySign(w, x) at the end restores it correctly for both +Inf and -Inf. For NaN, Math.copySign(NaN, x) still returns NaN. The signaling NaN distinction doesn't apply in the JVM, so the result is equivalent, just a different way to achieve the same thing.

mgaido91 · 2026-03-20T18:39:37Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala

+    } else if (ax > 2.0) {
+      StrictMath.log(2.0 * ax + 1.0 / (math.sqrt(x * x + 1.0) + ax))
+    } else {
+      StrictMath.log1p(ax + x * x / (1.0 + math.sqrt(1.0 + x * x)))


? here in fdmlib t = x * x

mgaido91 · 2026-03-20T18:40:26Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala

+         |} else if (ax > 2.0) {
+         |  w = $sm.log(2.0 * ax + 1.0 / (java.lang.Math.sqrt($c * $c + 1.0) + ax));
+         |} else {
+         |  w = $sm.log1p(ax + $c * $c / (1.0 + java.lang.Math.sqrt(1.0 + $c * $c)));


Suggested change

| w = $sm.log1p(ax + $c * $c / (1.0 + java.lang.Math.sqrt(1.0 + $c * $c)));

| double t = $c * $c;

| w = $sm.log1p(ax + t / (1.0 + java.lang.Math.sqrt(1.0 + t)));

mgaido91 · 2026-03-20T18:45:54Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala

+  extends UnaryMathExpression((x: Double) => {
+    // fdlibm e_acosh.c algorithm
+    if (x < 1.0) {
      Double.NaN


again, as a question, do you know why fdmlib here has (x - x) / (x - x)?

(x - x) / (x - x) is an fdlibm idiom to produce NaN while also raising the IEEE 754 "invalid operation" exception signal. In C, code can detect this via fetestexcept(FE_INVALID). The JVM has no IEEE 754 exception flag mechanism, (x - x) / (x - x) and Double.NaN are functionally identical in Java/Scala. So we use Double.NaN here for clarity.

…engine compatibility ### What changes were proposed in this pull request? Replace the naive `log(x + sqrt(x^2 ± 1))` formula in `Asinh` and `Acosh` with the full fdlibm algorithm (`s_asinh.c` / `e_acosh.c`), as implemented in OpenJDK `FdLibm.java`. For `Acosh` (5 branches): - `x < 1`: NaN - `x >= 2^28`: `log(x) + log(2)` - `x == 1`: `0.0` - `2 < x < 2^28`: `log(2x - 1/(x + sqrt(x*x - 1)))` - `1 < x < 2`: `log1p(t + sqrt(2*t + t*t))` For `Asinh` (4 branches): - `|x| < 2^-28`: `x` (identity) - `|x| > 2^28`: `sign(x) * (log(|x|) + log(2))` - `|x| > 2`: `sign(x) * log(2|x| + 1/(|x| + sqrt(x*x + 1)))` - otherwise: `sign(x) * log1p(|x| + x*x/(1 + sqrt(1 + x*x)))` ### Why are the changes needed? The fdlibm algorithm (`s_asinh.c` / `e_acosh.c`) is the de facto standard used by OpenJDK, glibc, musl, Go, Python, PostgreSQL, and DuckDB. Spark's previous naive formula `log(x + sqrt(x^2 ± 1))` produces results that differ by 1-2 ULP from these platforms for certain input ranges, causing cross-engine inconsistencies. The fdlibm algorithm also avoids catastrophic cancellation near x=1 (acosh) and x=0 (asinh) by using range-specific formulas (log1p, identity). This also changes the large-value threshold from `sqrt(Double.MaxValue)` to `2^28`, matching the fdlibm standard. ### Does this PR introduce _any_ user-facing change? Results may differ by 1-2 ULP for inputs in the `(1, 2]` range (acosh) and `[-2, 2]` range (asinh) due to the more precise formulas. The new results are closer to the mathematically exact values and consistent with other platforms. ### How was this patch tested? Updated tests in `MathExpressionsSuite` with fdlibm-aligned reference functions, plus new tests covering all branches with hardcoded reference values cross-verified against C libm. ### Was this patch authored or co-authored using generative AI tooling? Yes, co-authored with Kiro.

peter-toth reviewed Mar 20, 2026

View reviewed changes

peter-toth approved these changes Mar 20, 2026

View reviewed changes

mgaido91 reviewed Mar 20, 2026

View reviewed changes

xiaoxuandev force-pushed the fix-56089 branch from f81b6f7 to 5fe817c Compare March 20, 2026 20:04

	\| w = $sm.log1p(ax + $c * $c / (1.0 + java.lang.Math.sqrt(1.0 + $c * $c)));
	\| double t = $c * $c;
	\| w = $sm.log1p(ax + t / (1.0 + java.lang.Math.sqrt(1.0 + t)));

Conversation

xiaoxuandev commented Mar 19, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peter-toth left a comment

Choose a reason for hiding this comment

Uh oh!

peter-toth commented Mar 20, 2026

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants