Skip to content

Correctness + performance lab for embedding scatterplot (Euclidean and Poincaré disk)

License

Notifications You must be signed in to change notification settings

Hyper3Labs/hyper-scatter

Repository files navigation

hyper-scatter

npm version License: MIT

Hyperbolic (Poincaré) embeddings at 60 FPS. 20,000,000 points. Pure WebGL2 (no regl, no three.js).

🚀 Try the Interactive Demo & Benchmark Playground →

Poincaré Disk Demo
Hyperbolic pan & zoom in the Poincaré disk — points follow geodesics

A specialized scatterplot engine for HyperView.

  • Geometries: Poincaré (hyperbolic) + Euclidean today; Spherical (S²) is a good future contribution.
  • Correctness: a slow CPU Reference defines semantics; the fast GPU Candidate must match.
  • Implementation: pure WebGL2 (no regl, no three.js, no runtime deps).

Poincaré (Hyperbolic) semantics

This is the part most scatterplot libs don’t have.

  • View state: Möbius isometry parameter $a=(a_x,a_y)$ with $|a|<1$, plus a separate displayZoom scalar.
  • Pan: anchor-invariant (the point under the cursor stays under the cursor).
  • Zoom: anchored zoom; displayZoom scales the visual disk without changing the underlying isometry.
  • Project / unproject: stable round-trips, shared between Reference + Candidate math.
  • Hit-test: hyperbolic-aware, disk-culls correctly, deterministic tie-breaking.
  • Lasso: selection polygon is unprojected to data space; membership is verified against the Reference.

For the full invariants + how the harness selects candidate code paths, see AGENTS.md.

Usage (copy/paste agent prompt)

You are a coding agent working in my repository.

Use these imports:

   import {
      EuclideanWebGLCandidate,
      HyperbolicWebGLCandidate,
      createDataset,
      createInteractionController,
      type SelectionResult,
   } from 'hyper-scatter';

Goal:
- Integrate `hyper-scatter` to render my embedding scatterplot.

Requirements:
1) Install:
   - npm: `npm install hyper-scatter`

2) Implement a small integration wrapper:
    - Create `mountHyperScatter(canvas, params)` (or an idiomatic React hook).
    - Pick renderer:
       - if params.geometry === 'poincare' use `new HyperbolicWebGLCandidate()`
       - else use `new EuclideanWebGLCandidate()`
    - Ensure the canvas has a real CSS size (non-zero width/height).
    - Init using CSS pixels:
       - `const rect = canvas.getBoundingClientRect()`
       - `renderer.init(canvas, { width: Math.max(1, Math.floor(rect.width)), height: Math.max(1, Math.floor(rect.height)), devicePixelRatio: window.devicePixelRatio })`
    - Dataset:
       - `renderer.setDataset(createDataset(params.geometry, params.positions, params.labels))`
    - First frame:
       - `renderer.render()`

3) Wire interactions:
   - Use `createInteractionController(canvas, renderer, { onHover, onLassoComplete })`.
   - On lasso completion, keep the returned `SelectionResult` and (optionally) call:
     - `await renderer.countSelection(result, { yieldEveryMs: 0 })` if you need an exact count without UI yielding.

4) Cleanup:
   - On unmount/destroy: `controller.destroy(); renderer.destroy();`

Deliverables:
- The concrete code changes + file paths.
- A minimal example showing how to pass `Float32Array positions` (flat [x,y,x,y,...]) and optional `Uint16Array labels`.

Benchmarks

Main claim, measured via the browser harness (headed):

Config note: canvas 1125x400 @ 1x DPR (Puppeteer).

Geometry Points FPS (avg)
Euclidean 20,000,000 59.9
Poincaré 20,000,000 59.9

Run the stress benchmark that reproduces the rows above:

npm run bench -- --points=20000000

Default sweep (smaller point counts): npm run bench

Note: for performance numbers, run headed (default). Headless runs can skew GPU timing.

How we built it

I (Matin) only knew Python. So we built this as a lab with a clear loop.

Roles:

  • Matin: architect/product (Python-first).
  • Claude: harness/environment engineer (benchmarks + correctness tests + reference semantics).
  • Codex: implementation engineer (WebGL2 candidate).

1) Reference first

  • Write non-performant, readable Canvas2D renderers (src/impl_reference/).
  • Treat them as semantics: projection, pan/zoom, hit-test, lasso.

2) Harness as the reward function

  • Accuracy compares Reference vs Candidate for: project/unproject, pan/zoom invariance, hit-test, lasso.
  • Performance tracks: FPS, pan/hover FPS, hit-test time, lasso time.

3) Candidate optimization

  • Implement the WebGL2 candidate (src/impl_candidate/).
  • Speed comes from GPU rendering + spatial indexing + adaptive quality.

4) Reward hacking notes

If you give an agent a benchmark, it will try to win.

  • Editing the harness/tolerances instead of fixing precision.
  • Making lasso “async” so the timer looks better.

The harness tries to reduce these paths (example: lasso timing is end-to-end and includes the work required to get an exact selected-count).

Status & Roadmap

  • Euclidean Geometry
  • Poincaré Disk (Hyperbolic) Geometry
  • Spherical Geometry (S²): The architecture supports it (GeometryMode enum), but the Reference math is missing. Contributions welcome.

License

MIT © Matin Mahmood (X: @MatinMnM)

About

Correctness + performance lab for embedding scatterplot (Euclidean and Poincaré disk)

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published