Skip to content

[SPARK-56089][SQL] Align asinh/acosh with fdlibm algorithm for cross-engine compatibility#54912

Open
xiaoxuandev wants to merge 1 commit intoapache:masterfrom
xiaoxuandev:fix-56089
Open

[SPARK-56089][SQL] Align asinh/acosh with fdlibm algorithm for cross-engine compatibility#54912
xiaoxuandev wants to merge 1 commit intoapache:masterfrom
xiaoxuandev:fix-56089

Conversation

@xiaoxuandev
Copy link
Contributor

What changes were proposed in this pull request?

Replace the naive log(x + sqrt(x^2 ± 1)) formula in Asinh and Acosh with the full fdlibm algorithm (s_asinh.c / e_acosh.c), as implemented in OpenJDK FdLibm.java.

For Acosh (5 branches):

  • x < 1: NaN
  • x >= 2^28: log(x) + log(2)
  • x == 1: 0.0
  • 2 < x < 2^28: log(2x - 1/(x + sqrt(x*x - 1)))
  • 1 < x < 2: log1p(t + sqrt(2*t + t*t))

For Asinh (4 branches):

  • |x| < 2^-28: x (identity)
  • |x| > 2^28: sign(x) * (log(|x|) + log(2))
  • |x| > 2: sign(x) * log(2|x| + 1/(|x| + sqrt(x*x + 1)))
  • otherwise: sign(x) * log1p(|x| + x*x/(1 + sqrt(1 + x*x)))

Why are the changes needed?

The fdlibm algorithm (s_asinh.c / e_acosh.c) is the de facto standard used by OpenJDK, glibc, musl, Go, Python, PostgreSQL, and DuckDB. Spark's previous naive formula log(x + sqrt(x^2 ± 1)) produces results that differ by 1-2 ULP from these platforms for certain input ranges, causing cross-engine inconsistencies.

The fdlibm algorithm also avoids catastrophic cancellation near x=1 (acosh) and x=0 (asinh) by using range-specific formulas (log1p, identity).

This also changes the large-value threshold from sqrt(Double.MaxValue) to 2^28, matching the fdlibm standard.

Does this PR introduce any user-facing change?

Results may differ by 1-2 ULP for inputs in the (1, 2] range (acosh) and [-2, 2] range (asinh) due to the more precise formulas. The new results are closer to the mathematically exact values and consistent with other platforms.

How was this patch tested?

Updated tests in MathExpressionsSuite with fdlibm-aligned reference functions, plus new tests covering all branches with hardcoded reference values cross-verified against C libm.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with Kiro.

struct<ASINH(1):double>
-- !query output
0.8813735870195429
0.881373587019543
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked with PostgreSQL.

} else if (ax > 2.0) {
StrictMath.log(2.0 * ax + 1.0 / (math.sqrt(x * x + 1.0) + ax))
} else {
StrictMath.log1p(ax + x * x / (1.0 + math.sqrt(1.0 + x * x)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val t = x - 1.0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? here in fdmlib t = x * x

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, sorry, yes x * x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both, fixed!

Copy link
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I have only a nit.

@peter-toth
Copy link
Contributor

cc @sarutak, @mgaido91

Copy link
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple of question and one suggestion

// fdlibm s_asinh.c algorithm
val ax = Math.abs(x)
val w = if (ax.isInfinite || ax.isNaN) {
ax
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fbmlib they have x + x here IIUC. I do not see the rationale though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fdlibm, x + x for the inf/NaN case serves two purposes: (1) it propagates signaling NaN signals in C, and (2) it preserves the sign of infinity (since asinh is an odd function).
In our implementation, we take Math.abs(x) which strips the sign, but Math.copySign(w, x) at the end restores it correctly for both +Inf and -Inf. For NaN, Math.copySign(NaN, x) still returns NaN. The signaling NaN distinction doesn't apply in the JVM, so the result is equivalent, just a different way to achieve the same thing.

} else if (ax > 2.0) {
StrictMath.log(2.0 * ax + 1.0 / (math.sqrt(x * x + 1.0) + ax))
} else {
StrictMath.log1p(ax + x * x / (1.0 + math.sqrt(1.0 + x * x)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? here in fdmlib t = x * x

|} else if (ax > 2.0) {
| w = $sm.log(2.0 * ax + 1.0 / (java.lang.Math.sqrt($c * $c + 1.0) + ax));
|} else {
| w = $sm.log1p(ax + $c * $c / (1.0 + java.lang.Math.sqrt(1.0 + $c * $c)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| w = $sm.log1p(ax + $c * $c / (1.0 + java.lang.Math.sqrt(1.0 + $c * $c)));
| double t = $c * $c;
| w = $sm.log1p(ax + t / (1.0 + java.lang.Math.sqrt(1.0 + t)));

extends UnaryMathExpression((x: Double) => {
// fdlibm e_acosh.c algorithm
if (x < 1.0) {
Double.NaN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, as a question, do you know why fdmlib here has (x - x) / (x - x)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(x - x) / (x - x) is an fdlibm idiom to produce NaN while also raising the IEEE 754 "invalid operation" exception signal. In C, code can detect this via fetestexcept(FE_INVALID). The JVM has no IEEE 754 exception flag mechanism, (x - x) / (x - x) and Double.NaN are functionally identical in Java/Scala. So we use Double.NaN here for clarity.

…engine compatibility

### What changes were proposed in this pull request?

Replace the naive `log(x + sqrt(x^2 ± 1))` formula in `Asinh` and `Acosh` with the full fdlibm algorithm (`s_asinh.c` / `e_acosh.c`), as implemented in OpenJDK `FdLibm.java`.

For `Acosh` (5 branches):
- `x < 1`: NaN
- `x >= 2^28`: `log(x) + log(2)`
- `x == 1`: `0.0`
- `2 < x < 2^28`: `log(2x - 1/(x + sqrt(x*x - 1)))`
- `1 < x < 2`: `log1p(t + sqrt(2*t + t*t))`

For `Asinh` (4 branches):
- `|x| < 2^-28`: `x` (identity)
- `|x| > 2^28`: `sign(x) * (log(|x|) + log(2))`
- `|x| > 2`: `sign(x) * log(2|x| + 1/(|x| + sqrt(x*x + 1)))`
- otherwise: `sign(x) * log1p(|x| + x*x/(1 + sqrt(1 + x*x)))`

### Why are the changes needed?

The fdlibm algorithm (`s_asinh.c` / `e_acosh.c`) is the de facto standard used by OpenJDK, glibc, musl, Go, Python, PostgreSQL, and DuckDB. Spark's previous naive formula `log(x + sqrt(x^2 ± 1))` produces results that differ by 1-2 ULP from these platforms for certain input ranges, causing cross-engine inconsistencies.

The fdlibm algorithm also avoids catastrophic cancellation near x=1 (acosh) and x=0 (asinh) by using range-specific formulas (log1p, identity).

This also changes the large-value threshold from `sqrt(Double.MaxValue)` to `2^28`, matching the fdlibm standard.

### Does this PR introduce _any_ user-facing change?

Results may differ by 1-2 ULP for inputs in the `(1, 2]` range (acosh) and `[-2, 2]` range (asinh) due to the more precise formulas. The new results are closer to the mathematically exact values and consistent with other platforms.

### How was this patch tested?

Updated tests in `MathExpressionsSuite` with fdlibm-aligned reference functions, plus new tests covering all branches with hardcoded reference values cross-verified against C libm.

### Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with Kiro.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants