Project-Cortex v2.0 - 3D Spatial Audio Navigation System

Implementation Plan
Last Updated: November 26, 2025
Author: Haziq (@IRSPlays)
Status: ✅ Implemented with Procedural Sounds

⚠️ CRITICAL: BODY-RELATIVE NAVIGATION ⚠️

╔══════════════════════════════════════════════════════════════════╗
║  ⚠️  BODY-RELATIVE NAVIGATION  ⚠️                                  ║
╠══════════════════════════════════════════════════════════════════╣
║  This device uses Body-Relative Navigation.                      ║
║  You must TURN YOUR TORSO to center the sound,                  ║
║  not just your head.                                             ║
║                                                                  ║
║  The camera is mounted on your chest/torso, so sounds           ║
║  are positioned relative to where your body is facing.          ║
╚══════════════════════════════════════════════════════════════════╝

When demonstrating to YIA Judges:

"This device uses Body-Relative Navigation. You must turn your torso to center the sound, not just your head."

Why Body-Relative?

The IMX415 camera is mounted on the user's chest/torso
Sound positions are calculated from the camera's perspective
We do NOT use head tracking (no IMU/gyroscope on headphones)
This is simpler, more reliable, and works with any Bluetooth headphones

🎯 Executive Summary

This document outlines the complete implementation plan for the 3D Spatial Audio Navigation System - a core feature of Project-Cortex designed to help visually impaired users navigate their environment using binaural audio cues.

Key Features

Audio Beacons - Continuous directional sounds that guide users to targets
Proximity Alerts - Distance-based warning sounds that intensify as objects approach
Object Tracking - Sound sources that follow detected objects in real-time
Obstacle Warnings - Distinct alert sounds for safety-critical obstacles
Distance Estimation - Calculate real-world distance from YOLO bounding boxes
Object-Specific Sounds - Different audio cues for different object classes

🏗️ System Architecture

┌──────────────────────────────────────────────────────────────────────────────────┐
│                              USER QUERY / CONTINUOUS MODE                        │
│                    "Guide me to my chair" / Auto-detect obstacles                │
└─────────────────────────────────────┬────────────────────────────────────────────┘
                                      ▼
┌──────────────────────────────────────────────────────────────────────────────────┐
│                         LAYER 1: YOLO DETECTION OUTPUT                           │
│  ┌─────────────────────────────────────────────────────────────────────────────┐ │
│  │  Detection {                                                                │ │
│  │    class_name: "chair",                                                     │ │
│  │    confidence: 0.89,                                                        │ │
│  │    bbox: [x1, y1, x2, y2],  // Pixel coordinates                           │ │
│  │    frame_size: [1920, 1080]                                                 │ │
│  │  }                                                                          │ │
│  └─────────────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────┬────────────────────────────────────────────┘
                                      ▼
┌──────────────────────────────────────────────────────────────────────────────────┐
│                      SPATIAL POSITION CALCULATOR                                  │
│  ┌─────────────────────────────────────────────────────────────────────────────┐ │
│  │  bbox_to_3d_position(bbox, frame_size) → (x, y, z)                          │ │
│  │                                                                              │ │
│  │  x (horizontal): bbox_center_x → [-1.0, +1.0] (left to right)              │ │
│  │  y (vertical):   bbox_center_y → [-1.0, +1.0] (bottom to top)              │ │
│  │  z (depth):      bbox_area     → [-0.5, -10.0] (close to far)              │ │
│  └─────────────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────┬────────────────────────────────────────────┘
                                      ▼
┌──────────────────────────────────────────────────────────────────────────────────┐
│                         SPATIAL AUDIO MANAGER                                     │
│  ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐        │
│  │ Audio Beacon  │ │ Proximity     │ │ Object        │ │ Obstacle      │        │
│  │ System        │ │ Alert System  │ │ Tracker       │ │ Warner        │        │
│  │               │ │               │ │               │ │               │        │
│  │ Continuous    │ │ Distance-     │ │ Per-object    │ │ Safety-       │        │
│  │ guidance      │ │ based alerts  │ │ sounds        │ │ critical      │        │
│  └───────────────┘ └───────────────┘ └───────────────┘ └───────────────┘        │
└─────────────────────────────────────┬────────────────────────────────────────────┘
                                      ▼
┌──────────────────────────────────────────────────────────────────────────────────┐
│                          OPENAL 3D AUDIO ENGINE                                   │
│  ┌─────────────────────────────────────────────────────────────────────────────┐ │
│  │  Listener (User's head position + orientation)                              │ │
│  │    └── position: (0, 0, 0)     // User is always at origin                 │ │
│  │    └── orientation: (0, 0, -1) // Facing forward (-Z)                      │ │
│  │                                                                              │ │
│  │  Sources (One per tracked object)                                           │ │
│  │    └── chair_source.set_position(x, y, z)                                  │ │
│  │    └── car_source.set_position(x, y, z)                                    │ │
│  │    └── obstacle_source.set_position(x, y, z)                               │ │
│  │                                                                              │ │
│  │  HRTF Processing → Binaural Stereo Output → Bluetooth Headphones           │ │
│  └─────────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────────┘

📐 Core Algorithm: YOLO Bbox → 3D Position

Mathematical Model

def bbox_to_3d_position(bbox, frame_width, frame_height, known_object_sizes=None):
    """
    Convert YOLO bounding box to 3D audio position.
    
    Coordinate System (OpenAL):
    - X-axis: Left (-) to Right (+)
    - Y-axis: Down (-) to Up (+)  
    - Z-axis: Behind (+) to Front (-)  [User faces -Z]
    
    Args:
        bbox: (x1, y1, x2, y2) in pixels or normalized [0-1]
        frame_width, frame_height: Camera frame dimensions
        known_object_sizes: Dict of object_class → real_world_width (meters)
        
    Returns:
        (x, y, z): 3D position in OpenAL coordinate system
        distance_meters: Estimated real-world distance
    """
    # Normalize bbox to [0, 1] if in pixels
    if bbox[2] > 1.0:
        x1 = bbox[0] / frame_width
        y1 = bbox[1] / frame_height
        x2 = bbox[2] / frame_width
        y2 = bbox[3] / frame_height
    else:
        x1, y1, x2, y2 = bbox
    
    # Calculate center point
    center_x = (x1 + x2) / 2  # Range: 0 to 1
    center_y = (y1 + y2) / 2  # Range: 0 to 1
    
    # Calculate bbox area (normalized)
    width = x2 - x1
    height = y2 - y1
    area = width * height  # Range: 0 to 1
    
    # === HORIZONTAL POSITION (X-axis) ===
    # Map [0, 1] → [-1, +1] (left to right)
    x = (center_x - 0.5) * 2.0
    
    # === VERTICAL POSITION (Y-axis) ===
    # Map [0, 1] → [+1, -1] (top of frame = up in 3D)
    y = (0.5 - center_y) * 2.0
    
    # === DEPTH POSITION (Z-axis) ===
    # Larger bbox = closer = smaller |z|
    # Use inverse relationship: z = -k / sqrt(area)
    MIN_DISTANCE = 0.5   # Minimum z (closest)
    MAX_DISTANCE = 10.0  # Maximum z (farthest)
    AREA_CLOSE = 0.4     # Area when object is close
    AREA_FAR = 0.01      # Area when object is far
    
    # Clamp area to valid range
    area_clamped = max(AREA_FAR, min(AREA_CLOSE, area))
    
    # Inverse mapping: larger area → smaller distance
    normalized = (area_clamped - AREA_FAR) / (AREA_CLOSE - AREA_FAR)
    z = -(MIN_DISTANCE + (1 - normalized) * (MAX_DISTANCE - MIN_DISTANCE))
    
    return (x, y, z)

Distance Estimation with Known Object Sizes

For more accurate distance estimation, use pinhole camera model:

# Known object widths (in meters)
KNOWN_OBJECT_SIZES = {
    "person": 0.5,       # Average shoulder width
    "car": 1.8,          # Average car width
    "chair": 0.5,        # Average chair width
    "door": 0.9,         # Standard door width
    "stairs": 1.0,       # Standard staircase width
    "bicycle": 0.6,      # Handlebar width
    "dog": 0.3,          # Average dog body width
}

def estimate_distance_meters(bbox_width_pixels, frame_width, 
                             object_class, focal_length_pixels=1500):
    """
    Estimate real-world distance using pinhole camera model.
    
    Distance = (Known_Width × Focal_Length) / Bbox_Width
    
    Args:
        bbox_width_pixels: Width of bounding box in pixels
        frame_width: Total frame width in pixels
        object_class: Class name for known size lookup
        focal_length_pixels: Camera focal length in pixels (~1500 for IMX415)
    
    Returns:
        Distance in meters (or None if class unknown)
    """
    if object_class not in KNOWN_OBJECT_SIZES:
        return None
    
    known_width = KNOWN_OBJECT_SIZES[object_class]
    
    if bbox_width_pixels <= 0:
        return None
    
    distance = (known_width * focal_length_pixels) / bbox_width_pixels
    
    return max(0.3, min(20.0, distance))  # Clamp to valid range

🔊 Audio System Components

1. Audio Beacon System (`audio_beacon.py`)

Purpose: Provide continuous directional guidance to a target object.

Behavior:

Emits a rhythmic "ping" sound from the target's 3D position
Ping frequency increases as user gets closer
Changes to success chime when target is reached

Sound Parameters:

Distance	Ping Rate	Volume	Pitch
> 5m	1 Hz	0.3	Low
2-5m	2 Hz	0.5	Medium
1-2m	4 Hz	0.7	Higher
< 1m	8 Hz	1.0	Highest
Reached	Chime	1.0	Success tone

2. Proximity Alert System (`proximity_alert.py`)

Purpose: Warn users of approaching objects/obstacles.

Behavior:

Silent when objects are far (> 3m)
Warning tone increases in urgency as distance decreases
Critical alert for imminent collision (< 0.5m)

Alert Levels:

Level	Distance	Sound	Volume
None	> 3m	Silent	0.0
Notice	2-3m	Soft hum	0.3
Warning	1-2m	Pulse tone	0.6
Danger	0.5-1m	Fast beep	0.8
Critical	< 0.5m	Alarm	1.0

3. Object Sound Mapper (`object_sounds.py`)

Purpose: Assign distinct audio signatures to different object classes.

Sound Mapping:

Object Class	Sound Type	Description
person	Soft footsteps	Human presence indicator
chair	Wooden tap	Furniture indicator
car	Engine hum	Vehicle warning
bicycle	Bell ring	Cyclist warning
door	Knock sound	Passage indicator
stairs	Ascending tones	Level change warning
obstacle	Low buzz	Generic obstacle
target	Beacon ping	User's target

4. Object Tracker (`object_tracker.py`)

Purpose: Maintain audio sources for multiple detected objects.

Behavior:

Create/destroy audio sources as objects enter/exit frame
Smoothly interpolate positions to avoid audio jumps
Limit max concurrent audio sources (default: 5)

🛠️ Implementation Files

File Structure

src/layer3_guide/
├── __init__.py                 # Updated with SpatialAudio integration
├── router.py                   # Existing intent router
├── spatial_audio/
│   ├── __init__.py            # Module exports
│   ├── manager.py             # SpatialAudioManager - main orchestrator
│   ├── position_calculator.py # bbox → 3D position conversion
│   ├── audio_beacon.py        # Directional guidance beacons
│   ├── proximity_alert.py     # Distance-based warnings
│   ├── object_tracker.py      # Multi-object audio source management
│   ├── object_sounds.py       # Object class → sound mapping
│   └── head_tracking.py       # (Future) Headphone gyro integration
└── assets/
    └── sounds/
        ├── beacons/           # Ping, chime, direction sounds
        ├── alerts/            # Warning, danger, critical sounds
        ├── objects/           # Per-object-class sounds
        └── feedback/          # Success, error, notification sounds

Class Hierarchy

# Main orchestrator
class SpatialAudioManager:
    """Central controller for all spatial audio features."""
    
    def __init__(self, config_path: str = "config/spatial_audio.yaml")
    def start(self) -> bool
    def stop(self) -> None
    def update_detections(self, detections: List[Detection]) -> None
    def start_beacon(self, target_class: str) -> bool
    def stop_beacon(self) -> None
    def set_listener_orientation(self, yaw: float, pitch: float, roll: float) -> None
    def get_status(self) -> dict

# Position calculation
class PositionCalculator:
    """Converts 2D bbox to 3D audio position."""
    
    def bbox_to_3d(self, bbox, frame_size) -> Tuple[float, float, float]
    def estimate_distance(self, bbox, object_class) -> Optional[float]
    def smooth_position(self, new_pos, old_pos, alpha=0.3) -> Tuple

# Audio beacon for navigation
class AudioBeacon:
    """Continuous directional guidance sound."""
    
    def __init__(self, sound_file: str)
    def start(self, target_position: Tuple) -> None
    def update_position(self, position: Tuple) -> None
    def update_distance(self, distance_meters: float) -> None
    def stop(self) -> None

# Proximity alert system
class ProximityAlertSystem:
    """Distance-based warning sounds."""
    
    def __init__(self, alert_sounds: dict)
    def update(self, closest_obstacle_distance: float) -> None
    def set_alert_thresholds(self, thresholds: dict) -> None

# Object-specific sounds
class ObjectSoundMapper:
    """Maps object classes to distinct sounds."""
    
    def __init__(self, sound_config: dict)
    def get_sound(self, object_class: str) -> str
    def play_object_sound(self, object_class: str, position: Tuple) -> None

# Multi-object tracking
class ObjectTracker:
    """Manages audio sources for multiple objects."""
    
    def __init__(self, max_sources: int = 5)
    def update(self, detections: List[Detection]) -> None
    def get_active_sources(self) -> List[AudioSource]

📦 Dependencies

Required Packages

# Add to requirements.txt
PyOpenAL>=0.7.11a1       # OpenAL bindings for Python
numpy>=1.24.0            # Array operations
PyYAML>=6.0              # Configuration parsing

System Requirements

Windows:

# OpenAL-Soft is bundled with PyOpenAL
pip install PyOpenAL

Raspberry Pi / Linux:

# Install OpenAL-Soft system library
sudo apt-get install libopenal-dev libopenal1

# Then install Python bindings
pip install PyOpenAL

Optional (for OGG/FLAC support):

pip install PyOgg

⚙️ Configuration

`config/spatial_audio.yaml`

# Spatial Audio Configuration
# Project-Cortex v2.0

audio:
  sample_rate: 44100
  channels: 2
  buffer_size: 4096
  
# Coordinate mapping
position:
  min_distance: 0.5      # meters (closest)
  max_distance: 10.0     # meters (farthest)
  frame_width: 1920
  frame_height: 1080
  focal_length: 1500     # pixels (IMX415 estimate)

# Audio beacon settings
beacon:
  enabled: true
  ping_sound: "assets/sounds/beacons/ping.wav"
  success_sound: "assets/sounds/beacons/success.wav"
  ping_rates:            # Hz at different distances
    far: 1.0
    medium: 2.0
    close: 4.0
    very_close: 8.0
  distance_thresholds:   # meters
    far: 5.0
    medium: 2.0
    close: 1.0

# Proximity alert settings
proximity:
  enabled: true
  sounds:
    notice: "assets/sounds/alerts/notice.wav"
    warning: "assets/sounds/alerts/warning.wav"
    danger: "assets/sounds/alerts/danger.wav"
    critical: "assets/sounds/alerts/critical.wav"
  thresholds:            # meters
    notice: 3.0
    warning: 2.0
    danger: 1.0
    critical: 0.5

# Object sound mapping
objects:
  enabled: true
  max_simultaneous_sources: 5
  sounds:
    person: "assets/sounds/objects/person.wav"
    chair: "assets/sounds/objects/furniture.wav"
    car: "assets/sounds/objects/vehicle.wav"
    bicycle: "assets/sounds/objects/bicycle.wav"
    door: "assets/sounds/objects/door.wav"
    stairs: "assets/sounds/objects/stairs.wav"
    default: "assets/sounds/objects/generic.wav"

# Known object sizes for distance estimation (meters)
object_sizes:
  person: 0.5
  car: 1.8
  chair: 0.5
  door: 0.9
  stairs: 1.0
  bicycle: 0.6
  dog: 0.3
  tv: 1.0
  laptop: 0.35

# Head tracking (future feature)
head_tracking:
  enabled: false
  source: "bluetooth_hid"  # or "imu_sensor"
  update_rate: 30          # Hz

🧪 Testing Plan

Unit Tests

# tests/test_spatial_audio.py

def test_bbox_to_3d_center():
    """Object in center of frame should have x=0, y=0."""
    bbox = (0.4, 0.4, 0.6, 0.6)  # Center 20% of frame
    pos = bbox_to_3d_position(bbox, 1920, 1080)
    assert abs(pos[0]) < 0.1  # x ≈ 0
    assert abs(pos[1]) < 0.1  # y ≈ 0

def test_bbox_to_3d_left():
    """Object on left side should have negative x."""
    bbox = (0.0, 0.4, 0.2, 0.6)  # Left side
    pos = bbox_to_3d_position(bbox, 1920, 1080)
    assert pos[0] < 0  # x is negative (left)

def test_bbox_to_3d_distance_large_bbox():
    """Large bbox should be closer (smaller |z|)."""
    large_bbox = (0.2, 0.2, 0.8, 0.8)  # Takes up 36% of frame
    small_bbox = (0.45, 0.45, 0.55, 0.55)  # Takes up 1% of frame
    
    pos_large = bbox_to_3d_position(large_bbox, 1920, 1080)
    pos_small = bbox_to_3d_position(small_bbox, 1920, 1080)
    
    assert abs(pos_large[2]) < abs(pos_small[2])  # Large is closer

def test_distance_estimation():
    """Known object size should give accurate distance."""
    # Person at 2 meters with 500px bbox width
    distance = estimate_distance_meters(
        bbox_width_pixels=375,  # 500mm / 2m * 1500 focal length
        frame_width=1920,
        object_class="person",
        focal_length_pixels=1500
    )
    assert 1.5 < distance < 2.5  # Within 0.5m accuracy

Integration Tests

# tests/test_spatial_audio_integration.py

def test_full_pipeline():
    """Test complete YOLO → 3D Audio pipeline."""
    manager = SpatialAudioManager()
    manager.start()
    
    # Simulate YOLO detection
    detection = {
        "class_name": "chair",
        "confidence": 0.9,
        "bbox": [100, 200, 300, 400],  # pixels
        "frame_size": [1920, 1080]
    }
    
    manager.update_detections([detection])
    manager.start_beacon("chair")
    
    # Verify audio source created
    assert manager.get_status()["active_sources"] == 1
    assert manager.get_status()["beacon_active"] == True
    
    manager.stop()

📅 Implementation Timeline

Week 1: Core Framework

Day	Task	Deliverable
1	Set up PyOpenAL + test on Windows	Working 3D sound demo
2	Implement PositionCalculator	bbox_to_3d() function
3	Implement SpatialAudioManager	Basic orchestrator
4	Create placeholder sound files	WAV files for testing
5	Unit tests for position math	100% test coverage

Week 2: Feature Implementation

Day	Task	Deliverable
1	Implement AudioBeacon	Navigation guidance
2	Implement ProximityAlertSystem	Distance warnings
3	Implement ObjectSoundMapper	Per-class sounds
4	Implement ObjectTracker	Multi-object support
5	Integration with Layer 1 YOLO	End-to-end test

Week 3: Polish & Testing

Day	Task	Deliverable
1	Fine-tune audio parameters	Optimal UX
2	Test with real YOLO detections	Real-world validation
3	Test on Raspberry Pi 5	Platform validation
4	Documentation updates	README, API docs
5	Demo video recording	YIA presentation asset

🎯 Success Metrics

Metric	Target	Measurement Method
Position Accuracy	±15° azimuth	A/B test with sighted users
Distance Accuracy	±0.5m at 3m	Compare to tape measure
Audio Latency	<100ms	Frame timestamp to audio output
CPU Usage	<15% (audio only)	htop monitoring
User Comprehension	>80% correct direction	User study

🔮 Future Enhancements

Phase 2 (Post-YIA)

Head tracking via Bluetooth HID (AirPods Pro)
IMU sensor integration for head orientation
HRTF personalization based on ear shape
Environmental reverb (indoor vs outdoor)
Voice command: "What's to my left?"

Phase 3 (Long-term)

Machine learning for optimal sound design
Multi-user support (guide + companion)
Integration with Google Maps audio navigation
Smartwatch haptic feedback sync

📚 References

OpenAL Soft Documentation: https://openal-soft.org/
PyOpenAL GitHub: https://github.com/Zuzu-Typ/PyOpenAL
Microsoft Soundscape (Research): https://github.com/microsoft/soundscape
HRTF Explained: https://en.wikipedia.org/wiki/Head-related_transfer_function
3D Audio for Accessibility: IEEE VR 2023 Papers

Document Status: ✅ Ready for Implementation
Next Step: Create src/layer3_guide/spatial_audio/manager.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project-Cortex v2.0 - 3D Spatial Audio Navigation System

⚠️ CRITICAL: BODY-RELATIVE NAVIGATION ⚠️

🎯 Executive Summary

Key Features

🏗️ System Architecture

📐 Core Algorithm: YOLO Bbox → 3D Position

Mathematical Model

Distance Estimation with Known Object Sizes

🔊 Audio System Components

1. Audio Beacon System (`audio_beacon.py`)

2. Proximity Alert System (`proximity_alert.py`)

3. Object Sound Mapper (`object_sounds.py`)

4. Object Tracker (`object_tracker.py`)

🛠️ Implementation Files

File Structure

Class Hierarchy

📦 Dependencies

Required Packages

System Requirements

⚙️ Configuration

`config/spatial_audio.yaml`

🧪 Testing Plan

Unit Tests

Integration Tests

📅 Implementation Timeline

Week 1: Core Framework

Week 2: Feature Implementation

Week 3: Polish & Testing

🎯 Success Metrics

🔮 Future Enhancements

Phase 2 (Post-YIA)

Phase 3 (Long-term)

📚 References

FilesExpand file tree

spatial-audio-guide.md

Latest commit

History

spatial-audio-guide.md

File metadata and controls

Project-Cortex v2.0 - 3D Spatial Audio Navigation System

⚠️ CRITICAL: BODY-RELATIVE NAVIGATION ⚠️

🎯 Executive Summary

Key Features

🏗️ System Architecture

📐 Core Algorithm: YOLO Bbox → 3D Position

Mathematical Model

Distance Estimation with Known Object Sizes

🔊 Audio System Components

1. Audio Beacon System (audio_beacon.py)

2. Proximity Alert System (proximity_alert.py)

3. Object Sound Mapper (object_sounds.py)

4. Object Tracker (object_tracker.py)

🛠️ Implementation Files

File Structure

Class Hierarchy

📦 Dependencies

Required Packages

System Requirements

⚙️ Configuration

config/spatial_audio.yaml

🧪 Testing Plan

Unit Tests

Integration Tests

📅 Implementation Timeline

Week 1: Core Framework

Week 2: Feature Implementation

Week 3: Polish & Testing

🎯 Success Metrics

🔮 Future Enhancements

Phase 2 (Post-YIA)

Phase 3 (Long-term)

📚 References

1. Audio Beacon System (`audio_beacon.py`)

2. Proximity Alert System (`proximity_alert.py`)

3. Object Sound Mapper (`object_sounds.py`)

4. Object Tracker (`object_tracker.py`)

`config/spatial_audio.yaml`