Heihachi is an audio analysis and synthesis framework that implements the Categorical Audio Transport (CAT) specification. The framework treats audio signals as bounded oscillatory systems in finite phase space, where each signal carries two orthogonal information channels: the physical channel (PCM samples, bounded by sampling rate and Gabor uncertainty) and a categorical channel (partition coordinates and S-entropy trajectories, independent of sampling parameters). The categorical channel enables inter-sample trajectory recovery, simultaneous time-frequency precision beyond the Gabor limit, and objective groove quantification through Riemannian geometry on S-entropy space.
The system expresses audio as a thermodynamic gas ensemble, where oscillatory modes are molecular degrees of freedom, partition coordinates
The framework comprises three layers: a Rust-accelerated signal processing core, a Python analysis and distillation pipeline, and a browser-based GPU observation apparatus (WebGL/WebGPU shaders) that renders categorical state in real time. A companion web application (honbasho) provides a search-engine-style player with liquid-distortion visualization, Spotify integration, and an interference-based track similarity system.
Following the CAT specification (Sachikonye, 2026), the categorical state space of an audio signal is the product:
where
S-Entropy Coordinates (Definition 3.2): For an audio signal
where
Partition Coordinates (Definition 3.4): Each oscillatory mode is addressed by a 4-tuple
The framework derives from two axioms (bounded phase space and categorical observation) that three descriptions of any persistent dynamical system are equivalent:
| Representation | State Count | Audio Interpretation |
|---|---|---|
| Oscillatory | Frequency, phase, amplitude | Physical signal |
| Categorical | Partition cell occupancy | Information-theoretic state |
| Partitional | Coordinates |
Structured address |
These yield identical state counts
Two audio signals expressed as categorical spectra (collections of oscillator phases across partition classes) have a natural similarity measure: interference visibility.
where
Expressive micro-timing (groove) is formalized as geodesic deviation in S-entropy space. The Riemannian metric tensor
The geodesic distance between two rhythmic events quantifies the groove deviation with resolution independent of sample rate, providing the first physics-based measure of rhythmic feel.
The S-entropy coordinates provide simultaneous time and frequency precision beyond the Gabor limit:
This does not violate the Gabor-Heisenberg uncertainty principle because S-entropy coordinates are categorical observables, not physical observables. The commutation relation $[\hat{O}{\text{cat}}, \hat{O}{\text{phys}}] = 0$ ensures categorical precision is orthogonal to physical precision.
Audio File → Signal Processing (Rust) → Feature Extraction → Categorical State
↓
┌─────────────────────────────────────────────┤
↓ ↓
S-Entropy Trajectory Partition Coordinates
(Sk, St, Se) per frame (n, ℓ, m, s) per mode
↓ ↓
Groove Metric Phase Spectrum
(Riemannian distance) (8 oscillator classes)
↓ ↓
└──────────────┬──────────────────────────────┘
↓
Track Observation JSON
(structured categorical state)
Rust Core: Thermodynamic calculations, molecular physics simulation, equilibrium restoration, real-time signal processing. Provides 15-25x speed improvement over pure Python for spectral decomposition and partition coordinate computation.
Python Interface: PyO3 bindings for audio analysis, batch processing, REST API, and HuggingFace model integration.
The browser-based renderer implements four observation modes as fragment shaders:
| Mode | Shader | Paper Reference | Output |
|---|---|---|---|
| Partition Observation | Synthesizes categorical waveform from |
Theorem 5.1 | Waveform reconstruction |
| Gabor Bypass | Categorical time-frequency representation | Theorem 6.1 | Simultaneous time-frequency precision |
| Groove Metric | Riemannian distance on S-entropy manifold | Section 7 | Geodesic deviation field |
| S-Entropy Manifold | 3D |
Definition 3.2 | Phase space topology |
CPU computes S-entropy from Web Audio API analyser data. GPU renders the categorical state. The rendered texture IS the categorical observation, not a visualization of it.
Album artwork is rendered through a water-surface displacement shader driven by audio frequency data. The displacement field is the physical interference pattern of the audio signal's oscillatory content:
- Bass drives large surface waves (ocean swell)
- Mid drives secondary wave interference patterns
- Treble drives fine capillary wave detail
- Volume drives concentric water-droplet ripples (caustics)
Post-processing applies chromatic aberration proportional to distortion magnitude and vignette framing. For playlists, transitions between tracks use a displacement wave wash effect.
Each track accumulates a TrackSpectrum during playback: S-entropy statistics (mean, standard deviation), partition depth histogram, and phase accumulator across 8 oscillator classes. Similarity between any two tracks is computed as interference visibility of their phase spectra. No training, no embeddings, no feature engineering.
The Purpose framework generates stratified expert queries from categorical observations for LLM articulation:
| Depth | Query Type | Example |
|---|---|---|
| Basic | Factual | "What is the dominant partition depth and what does it indicate?" |
| Intermediate | Analytical | "What does the temporal entropy trajectory reveal about groove?" |
| Advanced | Synthesis | "Characterize the S-entropy region this track occupies." |
| Expert | Research-level | "Analyze the phase spectrum across 8 oscillator classes." |
The LLM does not perform matching. The interference shader already computes similarity. The LLM articulates the result: explaining why two tracks interfere constructively or destructively in terms of categorical properties, not genre labels.
The honbasho directory contains the Next.js web application deployed on Vercel.
The /player page functions as an audio search engine:
- Search: Spotify OAuth PKCE integration for searching and streaming tracks
- Display: Album artwork rendered through the liquid distortion shader
- Playback: Jukebox-style transport with auto-advance and skip controls
- Observation: Real-time categorical state extraction during playback
- Comparison: Interference visibility between observed tracks
Local fallback playlist includes 5 drum & bass / neurofunk reference tracks with album artwork.
The homepage renders a 3D computer desk scene (GLB model) with:
- Audio-reactive twist deformation via
onBeforeCompilevertex shader injection - GIF textures on monitor screens with per-screen audio-reactive fragment shaders (twist, glitch, chromatic aberration, pixelate, RGB split)
- GSAP-driven explosion/reassembly animation
- Audio-reactive LED color and scale pulsing
The /player page supports multiple visualization backends:
- Desk: 3D computer scene with audio-reactive materials
- Raymarch: WebGPU SDF raymarcher with audio-driven geometry
- CAT: Four categorical observation shader modes (Partition, Gabor, Groove, S-Entropy)
| Module | Method | Output |
|---|---|---|
| Spectral Analysis | FFT, multi-band decomposition | Frequency content, harmonic structure |
| Rhythmic Analysis | Onset detection, beat tracking | Drum patterns, groove quantification |
| Component Analysis | Source separation (Demucs v4) | Isolated stems (drums, bass, vocals, other) |
| Temporal Analysis | Change-point detection | Structural boundaries, transitions |
| Model | Task | Priority |
|---|---|---|
| microsoft/BEATs | Generic spectral + temporal embeddings (768-d) | High |
| openai/whisper-large-v3 | Robust features (1280-d) | High |
| Demucs v4 | 4-stem or 6-stem separation | High |
| Beat-Transformer | Beat/downbeat tracking (F-measure ~0.86) | High |
| laion/clap-htsat-fused | Text-audio similarity (512-d embeddings) | Medium |
| Endpoint | Method | Description |
|---|---|---|
/api/v1/analyze |
POST | Full audio analysis |
/api/v1/features |
POST | Feature extraction |
/api/v1/beats |
POST | Beat detection |
/api/v1/drums |
POST | Drum pattern analysis |
/api/v1/stems |
POST | Source separation |
/api/v1/semantic/analyze |
POST | Semantic analysis |
/api/v1/semantic/search |
POST | Semantic search |
/api/v1/batch-analyze |
POST | Batch processing |
Analysis of a 33-minute electronic music mix identified 91,179 drum hits classified into five categories:
| Drum Type | Count | Proportion | Avg. Confidence | Avg. Velocity |
|---|---|---|---|---|
| Hi-hat | 26,530 | 29.1% | 0.223 | 1.646 |
| Snare | 16,699 | 18.3% | 0.381 | 1.337 |
| Tom | 16,635 | 18.2% | 0.385 | 1.816 |
| Kick | 16,002 | 17.6% | 0.370 | 0.589 |
| Cymbal | 15,313 | 16.8% | 0.284 | 1.962 |
Confidence-velocity scatter analysis reveals type-specific clusters in the feature space, with toms and snares showing the most distinctive spectral signatures and hi-hats showing the widest distribution:
git clone https://github.com/fullscreen-triangle/heihachi.git
cd heihachi
python scripts/setup.py--install-dir DIR Installation directory
--dev Install development dependencies
--no-gpu Skip GPU acceleration dependencies
--venv Create and use a virtual environment
cd honbasho
npm install --legacy-peer-deps
npm run devSet NEXT_PUBLIC_SPOTIFY_CLIENT_ID in .env.local for Spotify integration.
# Process audio
heihachi process audio.wav --output results/
# Batch processing
heihachi batch audio_dir/ --config configs/performance.yaml
# HuggingFace models
heihachi hf extract audio.mp3 --output features.json
heihachi hf analyze-drums audio.wav --visualize
heihachi hf beats audio.mp3 --output beats.jsonfrom heihachi.gas_molecular import GasMolecularProcessor
processor = GasMolecularProcessor(ensemble_size=1000)
molecular_state = processor.process_audio("audio.wav")
restoration_path = molecular_state.restore_equilibrium()
meaning = restoration_path.extract_meaning()# Start server
python api_server.py --host 0.0.0.0 --port 5000
# Analyze audio
curl -X POST http://localhost:5000/api/v1/analyze -F "file=@track.wav"
# Semantic search
curl -X POST http://localhost:5000/api/v1/semantic/search \
-H "Content-Type: application/json" \
-d '{"query": "dark aggressive neurofunk with heavy bass", "top_k": 5}'| Operation | Latency |
|---|---|
| Spectral decomposition | <20 ms |
| Partition coordinate computation | <5 ms |
| S-entropy trajectory update | <2 ms |
| Interference visibility (2 tracks) | <1 ms |
| GPU categorical observation (fragment shader) | ~2 ms/frame |
| End-to-end analysis | <35 ms |
Memory: no pattern storage required. Categorical state is synthesized from the signal in real time. Storage scales with number of observed tracks, not with a pre-computed database.
heihachi/
├── src/ # Rust + Python signal processing core
├── core/ # Rust core library
├── honbasho/ # Next.js web application
│ ├── src/
│ │ ├── components/ # React components
│ │ │ ├── Desk.js # 3D desk scene with twist material
│ │ │ ├── LiquidDistortion.js # Water-surface displacement shader
│ │ │ ├── CategoricalObserver.js # S-entropy + partition observation shader
│ │ │ ├── InterferenceObserver.js # Track superposition shader
│ │ │ ├── SearchPlayer.js # Search engine player UI
│ │ │ └── PlayerAudioProvider.js # Web Audio API analyser
│ │ ├── lib/
│ │ │ ├── spotify.js # Spotify OAuth PKCE + API client
│ │ │ ├── categoricalAudio.js # S-entropy, partition coords, interference
│ │ │ └── purposeAudio.js # Expert query generation (Purpose pipeline)
│ │ └── hooks/
│ │ ├── useSpotify.js # Spotify auth hook
│ │ └── useTrackObserver.js # Per-track categorical state accumulator
│ └── public/ # Static assets (GLB models, audio, album art)
├── publication/ # LaTeX sources for CAT specification paper
├── configs/ # Processing configuration files
├── api_server.py # REST API server
└── scripts/ # Setup and utility scripts
MIT License. See LICENSE for details.
@software{heihachi2026,
title = {Heihachi: Categorical Audio Transport Framework},
author = {Kundai Farai Sachikonye},
year = {2026},
url = {https://github.com/fullscreen-triangle/heihachi}
}-
Sachikonye, K. F. (2026). "On the Geometric Consequences of Categorical Partitioning in Digital Audio Representation: An Orthogonal Information Channel for Digital Audio Beyond the Nyquist-Shannon-Gabor Limits."
-
Sachikonye, K. F. (2026). "Ray-Tracing as Cellular Computation: Simultaneous Optical, Chromatographic, and Circuit Observation Through Volumetric Partition Traversal."




