Watch the video:
A real-time AI video filter that uses phase correlation and fractal coherence analysis to create temporally stable visual transformations.
Most AI video filters suffer from severe hallucinations because they treat each frame independently. This project solves that by treating the video stream as a complex signal with measurable structural coherence (fractal dimension) and using phase locking to maintain temporal stability.
Uses OpenCV's phaseCorrelate() to calculate sub-pixel motion between frames via FFT. The previous AI-generated frame is physically warped to match your current movement before the next diffusion step, creating perfect temporal alignment.
Why this matters: Traditional optical flow is computationally expensive and unreliable. Phase correlation gives us the exact Fourier-domain shift in milliseconds, allowing real-time lock.
Measures the information viscosity of the input signal by computing fractal dimension (ฮฒ) across blur scales:
ฮฒ = texture_complexity(blur_level_1) - texture_complexity(blur_level_4)
- High viscosity (low motion): System trusts its internal model โ strong feedback โ crystallization
- Low viscosity (high motion): System trusts sensory input โ weak feedback โ adapts to change
Why this matters: By measuring how structure degrades under blur, we detect scene stability without expensive optical flow. Only regenerate AI frames when coherence changes significantly.
Feedback loops naturally average and blur. To fight this, an unsharp mask is applied inside the feedback path:
output = (1 + sharpness) ร locked_frame - sharpness ร gaussian_blur(locked_frame)
Since phase lock keeps edges aligned across frames, the sharpener repeatedly reinforces structural lines, evolving the image toward a "line art attractor state."
Why this matters: Without this, the system collapses to gray mush within seconds. With it, fine details crystallize and stabilize.
| Traditional AI Filters | Phase-Locked Flow |
|---|---|
| Each frame independent | Frames phase-locked |
| Constant flickering | Temporally coherent |
| No scene understanding | Fractal coherence gating |
| Blur accumulation | Active sharpening in loop |
| Full-frame processing | Selective masking (face-only) |
- Python 3.10 or 3.11 (recommended)
- NVIDIA GPU with 8GB+ VRAM
- CUDA 11.8 or higher
# Clone the repository
git clone https://github.com/anttiluode/stableaiflow.git
cd stableaiflow
# Install PyTorch with CUDA (critical for performance)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
# Install remaining dependencies
pip install -r requirements.txtopencv-python>=4.8.0
numpy>=1.24.0
torch>=2.0.0
diffusers>=0.21.0
transformers>=4.30.0
accelerate>=0.20.0
Pillow>=10.0.0python StableAIflow.pyOn first run, the model (~7GB) will download automatically. Subsequent runs start instantly.
Dream Strength (0.1 - 1.0)
- Low (0.3-0.5): Subtle artistic enhancement
- Medium (0.5-0.7): Strong stylization
- High (0.7-1.0): Complete transformation
Phase Lock Gravity (0.0 - 0.99)
- Controls feedback loop strength (temporal memory)
- 0.0: No memory, instant adaptation (flickery)
- 0.7: Balanced (recommended)
- 0.9+: Strong memory, crystallized structures
Loop Sharpness (Crystallizer) (0.0 - 2.0)
- 0.0: Soft, blurry (entropy wins)
- 1.0: Sharp, etched (structure wins) โ Recommended
- 2.0: Hyper-sharp (can introduce artifacts)
Mask Scale (0.5 - 2.0)
- Adjusts face detection region
- 1.0: Default face size
- 1.5: Expands to include hair/shoulders
Enable Face Mask (checkbox)
- โ On: AI effect only on face, real background
- โ Off: Full-frame AI transformation
The measurement of ฮฒ (fractal dimension) across blur scales acts as a zero-shot scene change detector:
def should_regenerate(image):
ฮฒ_shallow = measure_texture(image, blur=2px)
ฮฒ_deep = measure_texture(image, blur=8px)
gradient = ฮฒ_deep - ฮฒ_shallow
# Stable scene: gradient stays consistent
# Scene change: gradient shifts
return abs(gradient - last_gradient) > thresholdThis is not just blur detectionโit's measuring information viscosity. Objects with true multi-scale structure (faces, buildings) resist blur. Noise collapses immediately. This lets us distinguish between:
- Camera shake (high viscosity change โ don't regenerate)
- Actual motion (controlled viscosity change โ regenerate)
- Stillness (no change โ reuse last frame)
By warping the previous frame to match current motion, we provide the diffusion model with a temporally consistent starting point. This is equivalent to giving the model a "prediction" of what the current frame should look like, dramatically reducing the search space and preventing flicker.
The sharpening loop creates a natural attractor toward high-frequency structural patterns (edges, lines). Over time, the system converges on the simplest stable representation: clean line art. This is why "sketch" and "etching" styles emerge naturally without explicit prompting.
Tested on RTX 4060 (8GB):
- 15-20 FPS at 512ร512 (SDXL Turbo, 2-4 inference steps)
- ~400ms per generation
- 70-90% frame reuse (coherence gating efficiency)
Memory usage:
- ~6GB VRAM (model + single frame buffer)
- ~2GB RAM (video capture + GUI)
Black screen on startup:
- Check camera index (try 0, 1, or 2)
- Ensure webcam not in use by another app
Model won't load:
- Check internet connection (first run)
- Verify CUDA installation:
python -c "import torch; print(torch.cuda.is_available())" - Try fallback: Change model to
"stabilityai/stable-diffusion-2-base"
Low FPS / Stuttering:
- Reduce Dream Strength (fewer inference steps)
- Lower Phase Lock Gravity (less computation)
- Close other GPU-intensive apps
Face mask not working:
- Ensure good lighting
- Face camera directly
- Increase Mask Scale if face is large
- Check that Haar cascades are installed (part of opencv-python)
Traditional optical flow (Lucas-Kanade, Farneback) is:
- Computationally expensive (~50-100ms)
- Sensitive to lighting changes
- Requires iterative refinement
Phase correlation via FFT:
- Constant time: O(n log n)
- Sub-pixel accuracy
- Robust to brightness/contrast changes
- Perfect for real-time applications
next_frame = diffusion(
input = blend(
webcam,
phase_lock(last_ai_frame, motion_vector),
weight = viscosity ร gravity
),
strength = strength
)
This creates a predictive coding loop: the system maintains an internal model (last_ai_frame) and only updates when prediction error (viscosity change) exceeds threshold.
Traditional video models (AnimateDiff, Stable Video Diffusion) require:
- Massive training datasets
- Temporal attention layers (expensive)
- Multi-frame context windows
This approach achieves temporal coherence through:
- Zero training (pure physics)
- Single-frame context (cheap)
- Explicit temporal alignment (phase lock)
Trade-off: Less "intelligent" transitions, but 10-100ร faster and requires no training.
This project synthesizes ideas from:
- Predictive Coding (Friston et al.) - Brain as Bayesian inference engine
- Communication Through Coherence (Fries 2005) - Phase synchronization
- Fractal Dimension Analysis (Mandelbrot, Buzsรกki) - Multi-scale structure
- Mechanistic Interpretability (Anthropic 2024) - Understanding model internals
MIT License - Free to use, modify, and distribute.
- Stability AI for SDXL Turbo
- OpenCV team for phase correlation implementation
- The research discussions that led to fractal coherence analysis
- Community feedback and testing
Note: This is experimental research software. While it runs in real-time, it's not optimized for production use. Use at your own risk, and have fun exploring the phase-locked state space! ๐

