Update self-supervised losses implementation and tests by surajyadav-research · Pull Request #1531 · google-deepmind/optax

surajyadav-research · 2025-12-11T15:13:19Z

Summary of Improvements #1528

This PR corrects several issues in the existing self-supervised losses (BYOL, SimSiam, DINO, Barlow Twins) and brings the implementations in line with the original papers. The previous versions were minimal but incomplete, and in several cases unsafe (silent broadcasting, missing stop-gradients, incorrect view handling, etc.).
The new implementations add correct two-view behavior, strict validation, and safer numerical handling.

BYOL

Two-view formulation
- Before: Only supported a single (q, z) direction; users had to manually build the symmetric two-view BYOL loss outside the function.
- Now: Adds single-direction and symmetric two-view BYOL via symmetric and four projections, matching the paper.
Teacher gradients
- Before: Target projections were differentiable; forgetting to apply stop_gradient silently broke BYOL.
- Now: Always applies lax.stop_gradient on target projections inside the loss.
Shape validation
- Before: No shape checks → silent broadcasting or mismatched tensors.
- Now: Validates shapes for all projection pairs, failing fast on incorrect usage.
Cosine similarity & eps handling
- Before: Reimplemented cosine similarity manually and used hard-coded finfo.eps.
- Now: Uses shared _regression.cosine_similarity and a dtype-safe, configurable eps.

SimSiam

Symmetric loss support
- Before: Only computed D(p1, z2); symmetric SimSiam required manual assembly.
- Now: Supports single-direction and symmetric two-view SimSiam in one API.
Stop-gradient enforcement
- Before: Relied on the caller to apply stop_gradient(z); forgetting it invalidates SimSiam.
- Now: The loss internally stops gradients through the target projections.
Shape checks
- Before: No shape checks between predictor and target tensors.
- Now: Enforces matching shapes for all predictor/target pairs.
Shared cosine similarity
- Before: Manual inline cosine implementation.
- Now: Uses _regression.cosine_similarity with dtype-aligned eps.

DINO

Single-view and two-view support
- Before: Only supported a single (student, teacher) pair; cross-view teacher/student matching had to be implemented manually.
- Now: Adds single-view and symmetric two-view DINO via two_view and a second logit pair.
Teacher stop-gradient
- Before: Teacher probabilities were differentiable → gradients flowed into the EMA teacher.
- Now: Wraps teacher softmax outputs in lax.stop_gradient.
Temperature and shape validation
- Before: Did not validate shapes or require temperatures to be positive.
- Now: Checks logits shapes and asserts positive temperatures before use.
Centering & broadcasting
- Before: center subtraction relied on uncontrolled broadcasting.
- Now: Casts and broadcasts center explicitly, ensuring consistent teacher normalization across views.

Barlow Twins

Input shape & rank checks
- Before: Accepted arbitrary shapes; incorrect ranks led to meaningless correlation matrices.
- Now: Enforces matching shapes and rank-2 inputs [batch, feature_dim].
Numerically stable normalization
- Before: Used jnp.std with eps added after; less explicit numeric pathway.
- Now: Computes variance explicitly, adds eps before sqrt, and normalizes safely.
Dtype-safe hyperparameters
- Before: eps and lambda were raw Python floats (problematic in mixed precision).
- Now: Casts both to the projection dtype for consistent computation.
Clear loss decomposition
- Before: Diagonal and off-diagonal terms were computed but not clearly separated.
- Now: Explicit on-diag + off-diag formulation directly matching the Barlow Twins paper.

Tests

The test suite was updated to rigorously validate both the math and the new code paths.

BYOL

Did not test symmetric path → Now tests exact symmetric formula with a handmade cosine reference.
Only rough cosine check on random data → Now uses deterministic handmade comparison against the analytical expression.
Weak JIT check (shape only) → Now verifies the JIT’ed loss matches the mathematical ground truth.

SimSiam

Never validated symmetric mode → Added an exact symmetric test that matches the cosine-based definition.
Only tested trivial edge cases (identical/orthogonal) → Now tests the full objective with a precise reference implementation.
No strict math-level comparison → Now enforces exact equivalence between implementation and the SimSiam loss definition.

DINO

No verification of the full KL / cross-entropy formula → Now compares against a handmade softmax/log-softmax reference.
Two-view coupling never tested → Added explicit L12/L21 tests that compute both directions by hand.
No argument validation tests → Added tests that temperatures must be positive and logit shapes must match.

Barlow Twins

Did not test the full normalization + cross-correlation pipeline → Now checks against a full handmade correlation computation.
Previously only asserted non-negativity → Now validates the complete objective numerically, including on- and off-diagonal terms.
No invalid-shape testing → Added tests for rank mismatches and shape mismatches to confirm error paths.

surajyadav-research · 2025-12-18T11:41:34Z

Hi @rajasekharporeddy
This PR is ready for review. I’ve updated the docs, implementation, and added/updated tests. All CI checks are green and CLA is done. Would you mind taking a look when you get a chance?

rdyro

Thanks for the PR! I left some comments, mostly about improving the tests

rdyro · 2025-12-26T20:52:00Z

+        dtype=jnp.float32,
+    )
+
+    def testing_barlow_twins_loss(


instead of reimplementing the loss logic, can you compare e.g., that the loss is zero (and not nan) for identical inputs and non-zero otherwise?

rdyro · 2025-12-26T20:52:38Z

+    )
+    np.testing.assert_allclose(result, handmade_result, atol=1e-6)
+
+  def test_two_view_matches_handmade(self):