Skip to content

anttiluode/StableAIflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

14 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Stable AI Flow: Phase-Locked Live AI Filter

Watch the video:

Watch the video

A real-time AI video filter that uses phase correlation and fractal coherence analysis to create temporally stable visual transformations.

Demo

๐Ÿ”ฌ The Core Discovery: Physics-Based Temporal Coherence

Most AI video filters suffer from severe hallucinations because they treat each frame independently. This project solves that by treating the video stream as a complex signal with measurable structural coherence (fractal dimension) and using phase locking to maintain temporal stability.

The Three Pillars

1. Phase Locking (Temporal Registration)

Uses OpenCV's phaseCorrelate() to calculate sub-pixel motion between frames via FFT. The previous AI-generated frame is physically warped to match your current movement before the next diffusion step, creating perfect temporal alignment.

Why this matters: Traditional optical flow is computationally expensive and unreliable. Phase correlation gives us the exact Fourier-domain shift in milliseconds, allowing real-time lock.

2. Fractal Viscosity (Coherence Gating)

Measures the information viscosity of the input signal by computing fractal dimension (ฮฒ) across blur scales:

ฮฒ = texture_complexity(blur_level_1) - texture_complexity(blur_level_4)
  • High viscosity (low motion): System trusts its internal model โ†’ strong feedback โ†’ crystallization
  • Low viscosity (high motion): System trusts sensory input โ†’ weak feedback โ†’ adapts to change

Why this matters: By measuring how structure degrades under blur, we detect scene stability without expensive optical flow. Only regenerate AI frames when coherence changes significantly.

3. The Crystallizer (Entropy Resistance)

Feedback loops naturally average and blur. To fight this, an unsharp mask is applied inside the feedback path:

output = (1 + sharpness) ร— locked_frame - sharpness ร— gaussian_blur(locked_frame)

Since phase lock keeps edges aligned across frames, the sharpener repeatedly reinforces structural lines, evolving the image toward a "line art attractor state."

Why this matters: Without this, the system collapses to gray mush within seconds. With it, fine details crystallize and stabilize.

๐ŸŽฏ What Makes This Different

Traditional AI Filters Phase-Locked Flow
Each frame independent Frames phase-locked
Constant flickering Temporally coherent
No scene understanding Fractal coherence gating
Blur accumulation Active sharpening in loop
Full-frame processing Selective masking (face-only)

๐Ÿš€ Installation

Prerequisites

  • Python 3.10 or 3.11 (recommended)
  • NVIDIA GPU with 8GB+ VRAM
  • CUDA 11.8 or higher

Setup

# Clone the repository
git clone https://github.com/anttiluode/stableaiflow.git
cd stableaiflow

# Install PyTorch with CUDA (critical for performance)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Install remaining dependencies
pip install -r requirements.txt

Requirements

opencv-python>=4.8.0
numpy>=1.24.0
torch>=2.0.0
diffusers>=0.21.0
transformers>=4.30.0
accelerate>=0.20.0
Pillow>=10.0.0

๐ŸŽฎ Usage

python StableAIflow.py

On first run, the model (~7GB) will download automatically. Subsequent runs start instantly.

Controls Guide

Dream Strength (0.1 - 1.0)

  • Low (0.3-0.5): Subtle artistic enhancement
  • Medium (0.5-0.7): Strong stylization
  • High (0.7-1.0): Complete transformation

Phase Lock Gravity (0.0 - 0.99)

  • Controls feedback loop strength (temporal memory)
  • 0.0: No memory, instant adaptation (flickery)
  • 0.7: Balanced (recommended)
  • 0.9+: Strong memory, crystallized structures

Loop Sharpness (Crystallizer) (0.0 - 2.0)

  • 0.0: Soft, blurry (entropy wins)
  • 1.0: Sharp, etched (structure wins) โ† Recommended
  • 2.0: Hyper-sharp (can introduce artifacts)

Mask Scale (0.5 - 2.0)

  • Adjusts face detection region
  • 1.0: Default face size
  • 1.5: Expands to include hair/shoulders

Enable Face Mask (checkbox)

  • โœ… On: AI effect only on face, real background
  • โŒ Off: Full-frame AI transformation

๐Ÿงช The Science: Why This Works

Fractal Coherence as Scene Stability

The measurement of ฮฒ (fractal dimension) across blur scales acts as a zero-shot scene change detector:

def should_regenerate(image):
    ฮฒ_shallow = measure_texture(image, blur=2px)
    ฮฒ_deep = measure_texture(image, blur=8px)
    gradient = ฮฒ_deep - ฮฒ_shallow
    
    # Stable scene: gradient stays consistent
    # Scene change: gradient shifts
    return abs(gradient - last_gradient) > threshold

This is not just blur detectionโ€”it's measuring information viscosity. Objects with true multi-scale structure (faces, buildings) resist blur. Noise collapses immediately. This lets us distinguish between:

  • Camera shake (high viscosity change โ†’ don't regenerate)
  • Actual motion (controlled viscosity change โ†’ regenerate)
  • Stillness (no change โ†’ reuse last frame)

Phase Locking as Temporal Prior

By warping the previous frame to match current motion, we provide the diffusion model with a temporally consistent starting point. This is equivalent to giving the model a "prediction" of what the current frame should look like, dramatically reducing the search space and preventing flicker.

Crystallization as Learned Attractor

The sharpening loop creates a natural attractor toward high-frequency structural patterns (edges, lines). Over time, the system converges on the simplest stable representation: clean line art. This is why "sketch" and "etching" styles emerge naturally without explicit prompting.

๐Ÿ“Š Performance

Tested on RTX 4060 (8GB):

  • 15-20 FPS at 512ร—512 (SDXL Turbo, 2-4 inference steps)
  • ~400ms per generation
  • 70-90% frame reuse (coherence gating efficiency)

Memory usage:

  • ~6GB VRAM (model + single frame buffer)
  • ~2GB RAM (video capture + GUI)

๐Ÿ› Troubleshooting

Black screen on startup:

  • Check camera index (try 0, 1, or 2)
  • Ensure webcam not in use by another app

Model won't load:

  • Check internet connection (first run)
  • Verify CUDA installation: python -c "import torch; print(torch.cuda.is_available())"
  • Try fallback: Change model to "stabilityai/stable-diffusion-2-base"

Low FPS / Stuttering:

  • Reduce Dream Strength (fewer inference steps)
  • Lower Phase Lock Gravity (less computation)
  • Close other GPU-intensive apps

Face mask not working:

  • Ensure good lighting
  • Face camera directly
  • Increase Mask Scale if face is large
  • Check that Haar cascades are installed (part of opencv-python)

๐Ÿ”ฌ Technical Deep Dive

Why Phase Correlation?

Traditional optical flow (Lucas-Kanade, Farneback) is:

  • Computationally expensive (~50-100ms)
  • Sensitive to lighting changes
  • Requires iterative refinement

Phase correlation via FFT:

  • Constant time: O(n log n)
  • Sub-pixel accuracy
  • Robust to brightness/contrast changes
  • Perfect for real-time applications

The Feedback Equation

next_frame = diffusion(
    input = blend(
        webcam,
        phase_lock(last_ai_frame, motion_vector),
        weight = viscosity ร— gravity
    ),
    strength = strength
)

This creates a predictive coding loop: the system maintains an internal model (last_ai_frame) and only updates when prediction error (viscosity change) exceeds threshold.

Comparison to Video Diffusion Models

Traditional video models (AnimateDiff, Stable Video Diffusion) require:

  • Massive training datasets
  • Temporal attention layers (expensive)
  • Multi-frame context windows

This approach achieves temporal coherence through:

  • Zero training (pure physics)
  • Single-frame context (cheap)
  • Explicit temporal alignment (phase lock)

Trade-off: Less "intelligent" transitions, but 10-100ร— faster and requires no training.

๐Ÿ“š Related Research

This project synthesizes ideas from:

  • Predictive Coding (Friston et al.) - Brain as Bayesian inference engine
  • Communication Through Coherence (Fries 2005) - Phase synchronization
  • Fractal Dimension Analysis (Mandelbrot, Buzsรกki) - Multi-scale structure
  • Mechanistic Interpretability (Anthropic 2024) - Understanding model internals

๐Ÿ“„ License

MIT License - Free to use, modify, and distribute.

๐Ÿ™ Acknowledgments

  • Stability AI for SDXL Turbo
  • OpenCV team for phase correlation implementation
  • The research discussions that led to fractal coherence analysis
  • Community feedback and testing

Note: This is experimental research software. While it runs in real-time, it's not optimized for production use. Use at your own risk, and have fun exploring the phase-locked state space! ๐Ÿš€

About

A real-time, physics-based AI video filter that uses **Phase Locking** and **Fractal Feedback** to create stable, temporally consistent hallucinations without training a video model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages