PairUX Scaling Architecture

Overview

This document outlines PairUX's scaling strategy, inspired by Jitsi's battle-tested architecture. The key principle: start simple, scale when needed.

Philosophy

Don't over-engineer early - P2P works great for most use cases
Separate concerns - Media routing and session control are different problems
Scale horizontally - Add capacity, don't make single components bigger
Encrypt by default - Security shouldn't be an afterthought
Roles matter - Not everyone needs the same capabilities

When to Use What

Scenario	Recommended Mode	Why
1:1 pair programming	P2P	Lowest latency, zero infrastructure cost
Small team review (2-5)	P2P	Still manageable, direct connections
Demo to team (5-15)	P2P or SFU	SFU reduces host load
Presentation (15-50)	SFU	Host can't handle 50 direct streams
Large event (50-500)	Multi-SFU	Single SFU becomes bottleneck
Enterprise broadcast (500+)	Cascaded SFUs	Regional distribution required

Architecture at a Glance

MVP Architecture

graph TB
    subgraph Clients
        Host[Host Desktop App]
        V1[Viewer 1]
        V2[Viewer 2]
        V3[Viewer N]
    end

    subgraph ControlPlane [Control Plane - Supabase]
        Realtime[Supabase Realtime]
        DB[(PostgreSQL)]
        Auth[Supabase Auth]
    end

    subgraph MediaPlane [Media Plane - Simple]
        P2P[P2P Direct]
        SFU[Single SFU]
        TURN[TURN Server]
    end

    Host <-->|Signaling| Realtime
    V1 <-->|Signaling| Realtime
    V2 <-->|Signaling| Realtime

    Host <-->|Media| P2P
    Host <-->|Media| SFU
    SFU -->|Forward| V1
    SFU -->|Forward| V2
    SFU -->|Forward| V3

    Host <-.->|Relay| TURN

Scaled Architecture

graph TB
    subgraph Clients
        Host[Host Desktop App]
        Controllers[Controllers 1-3]
        Viewers[Viewers 1-N]
    end

    subgraph ControlPlane [Control Plane - Dedicated]
        SessionFocus[Session Focus Service]
        StateService[Room State Service]
        Realtime[Supabase Realtime]
        DB[(PostgreSQL)]
    end

    subgraph MediaPlane [Media Plane - Distributed]
        subgraph RegionA [Region A]
            SFUA1[SFU A1]
            SFUA2[SFU A2]
        end
        subgraph RegionB [Region B]
            SFUB1[SFU B1]
        end
        Cascade[Bridge Cascading]
    end

    SessionFocus -->|Assign SFU| Host
    SessionFocus -->|Assign SFU| Viewers
    Host -->|Publish| SFUA1
    SFUA1 <-->|Cascade| SFUB1
    SFUA1 -->|Forward| Controllers
    SFUB1 -->|Forward| Viewers

Component Mapping

Component	MVP	Scaled
Signaling	Supabase Realtime	Supabase Realtime
Room State	Supabase DB	Dedicated State Service
Topology Decision	Client-side	Session Focus Service
Media Routing	P2P / Single SFU	Regional SFU Pools
SFU Selection	Manual toggle	Automatic assignment
Encryption	DTLS/SRTP	Optional E2EE layer

The 5 Critical Decisions

Decision 1: Media Plane vs Control Plane Separation

Principle: Keep media routing separate from session orchestration. They scale differently.

Why This Matters

Media is bandwidth-heavy, latency-sensitive
Control is lightweight, consistency-sensitive
Coupling them means you can't scale one without the other
Jitsi learned this: JVB (media) vs Jicofo/Prosody (control)

MVP Implementation

Media Plane:

P2P direct connections (default)
Single SFU instance for larger sessions
Self-hosted TURN for NAT traversal

Control Plane:

Supabase Realtime for signaling (offer/answer/ICE)
Supabase DB for room state (participants, roles, settings)
Client-side logic decides P2P vs SFU

// MVP: Client decides topology
interface SessionConfig {
  mode: 'p2p' | 'sfu';
  sfuEndpoint?: string;
}

// Control plane: Supabase handles signaling
const channel = supabase.channel(`session:${sessionId}`);
channel.on('broadcast', { event: 'signal' }, handleSignal);

What you get:

Simple deployment (Supabase + optional SFU)
No custom backend services
Works for 90% of use cases

Scaled Implementation

Media Plane:

Regional SFU pools (LiveKit Cloud or self-hosted)
Load balancer distributes sessions across SFUs
Bridge cascading for cross-region sessions

Control Plane:

Supabase Realtime still handles signaling
Dedicated "Room State Service" for complex state
Session Focus service assigns SFUs

// Scaled: Server assigns SFU
interface SFUAssignment {
  sfuId: string;
  region: string;
  endpoint: string;
  token: string;
}

// Session Focus service
async function assignSFU(sessionId: string, participantRegion: string): Promise<SFUAssignment> {
  const availableSFUs = await getSFUsByRegion(participantRegion);
  const bestSFU = selectByLoad(availableSFUs);
  return {
    sfuId: bestSFU.id,
    region: bestSFU.region,
    endpoint: bestSFU.endpoint,
    token: await generateSFUToken(sessionId, bestSFU.id),
  };
}

Migration Triggers

Move from MVP to Scaled when:

Single SFU CPU consistently >70%
Users in multiple geographic regions complain about latency
Need >100 concurrent sessions
Need automatic failover

Decision 2: Session Controller / Orchestrator

Principle: A server-side brain should decide topology, not clients.

Why This Matters

Clients don't know global state (other sessions, SFU load)
Consistent decisions prevent split-brain scenarios
Enables automatic optimization and failover
Jitsi's Jicofo is the "conference focus" - it conducts the orchestra

MVP Implementation

Client-Side Decisions:

Host chooses P2P or SFU mode at session start
Mode is fixed for session lifetime
No automatic switching

// MVP: Host makes the call
interface CreateSessionRequest {
  mode: 'p2p' | 'sfu';
  maxControllers: number;
  maxViewers: number;
}

// Simple safeguards
function validateJoin(session: Session, currentCount: number): boolean {
  if (session.mode === 'p2p' && currentCount >= 25) {
    return false; // Soft limit for P2P
  }
  return true;
}

What you get:

No additional backend services
Predictable behavior
Host has full control

Scaled Implementation

Session Focus Service:

Server-side service that manages all active sessions
Assigns SFUs based on load, region, and session requirements
Handles SFU failover and rebalancing
Manages host transfer and role changes

// Session Focus Service
interface SessionFocus {
  sessionId: string;
  assignedSFU: string;
  participants: Map<string, ParticipantAssignment>;
  mode: 'p2p' | 'sfu' | 'cascaded';
}

class SessionFocusService {
  private sessions: Map<string, SessionFocus> = new Map();

  async onParticipantJoin(
    sessionId: string,
    participantId: string,
    region: string
  ): Promise<JoinResponse> {
    const session = this.sessions.get(sessionId);

    // Decide if we need to upgrade to SFU
    if (session.mode === 'p2p' && session.participants.size >= 10) {
      await this.upgradeToSFU(sessionId);
    }

    // Assign participant to best SFU
    const assignment = await this.assignParticipant(session, participantId, region);

    return {
      sfuEndpoint: assignment.endpoint,
      token: assignment.token,
      mode: session.mode,
    };
  }

  async onSFUFailure(sfuId: string): Promise<void> {
    const affectedSessions = this.getSessionsBySFU(sfuId);
    for (const session of affectedSessions) {
      await this.migrateSession(session.sessionId);
    }
  }
}

Migration Triggers

Move from MVP to Scaled when:

Need automatic P2P → SFU upgrades
Need SFU failover without user intervention
Need cross-region optimization
Managing >50 concurrent sessions manually becomes painful

Decision 3: Horizontal SFU Scaling

Principle: Scale out with multiple SFUs, not up with bigger servers.

Why This Matters

Single SFU has CPU/bandwidth limits (~500-1000 participants max)
Geographic distribution reduces latency
Redundancy enables failover
Jitsi uses multiple JVBs with Octor for cascading

MVP Implementation

Single SFU Pool:

One SFU instance (or small cluster)
Manual region selection if needed
Soft limits prevent overload

// MVP: Single SFU configuration
const sfuConfig = {
  endpoint: process.env.SFU_ENDPOINT,
  maxParticipantsPerRoom: 100,
  maxConcurrentRooms: 50,
};

// Simple load check
async function canAcceptSession(): Promise<boolean> {
  const stats = await getSFUStats();
  return stats.currentRooms < sfuConfig.maxConcurrentRooms;
}

SFU Options for MVP:

Option	Pros	Cons
LiveKit Cloud	Managed, easy setup	Cost scales with usage
Self-hosted LiveKit	Full control, predictable cost	Ops overhead
mediasoup	Lightweight, flexible	More DIY required

Scaled Implementation

Multi-Region SFU Pools:

SFU clusters in each major region
Load balancer per region
Bridge cascading for cross-region sessions

// Scaled: Regional SFU management
interface SFUPool {
  region: string;
  instances: SFUInstance[];
  loadBalancer: string;
}

interface SFUInstance {
  id: string;
  endpoint: string;
  currentLoad: number; // 0-100
  maxCapacity: number;
  healthy: boolean;
}

class SFUManager {
  private pools: Map<string, SFUPool> = new Map();

  async selectSFU(region: string, sessionSize: number): Promise<SFUInstance> {
    const pool = this.pools.get(region) ?? this.pools.get('default');

    // Filter healthy instances with capacity
    const available = pool.instances.filter((sfu) => sfu.healthy && sfu.currentLoad < 80);

    // Select least loaded
    return available.sort((a, b) => a.currentLoad - b.currentLoad)[0];
  }

  async setupCascade(primarySFU: string, secondarySFU: string, sessionId: string): Promise<void> {
    // Connect two SFUs for cross-region forwarding
    await this.connectBridges(primarySFU, secondarySFU, sessionId);
  }
}

Bridge Cascading:

graph LR
    subgraph US-West
        Host[Host] --> SFUW[SFU West]
        C1[Controller] <--> SFUW
    end

    subgraph EU
        SFUE[SFU EU]
        V1[Viewer EU 1] <--> SFUE
        V2[Viewer EU 2] <--> SFUE
    end

    SFUW <-->|Cascade| SFUE

Migration Triggers

Move from MVP to Scaled when:

Single SFU CPU consistently >60%
Users in 2+ geographic regions
Need >100 viewers per session
Require 99.9% uptime SLA

Decision 4: End-to-End Encryption (E2EE)

Principle: Offer E2EE as an option for sensitive sessions without breaking SFU scalability.

Why This Matters

Default WebRTC encryption (DTLS/SRTP) protects transport
But SFU can see media content (it decrypts to route)
True E2EE means even SFU can't read media
Jitsi implements this with WebRTC Insertable Streams

MVP Implementation

Transport Encryption Only:

DTLS/SRTP encrypts all WebRTC traffic
SFU can see media (required for routing)
Sufficient for most use cases

// MVP: Standard WebRTC encryption
const peerConnection = new RTCPeerConnection({
  iceServers: [...],
  // DTLS/SRTP enabled by default
});

// No additional encryption layer needed
// Media is encrypted in transit but SFU can process it

What you get:

Zero additional complexity
Full SFU feature support (simulcast, bandwidth estimation)
Works in all browsers

When this is enough:

Internal team collaboration
Non-sensitive content
Trust your infrastructure

Scaled Implementation

Insertable Streams E2EE:

Encrypt media frames before sending to SFU
SFU forwards encrypted packets without decryption
Only participants with key can decrypt

// Scaled: E2EE with Insertable Streams
interface E2EEConfig {
  enabled: boolean;
  keyRotationInterval: number; // milliseconds
}

class E2EEManager {
  private encryptionKey: CryptoKey | null = null;

  async enableE2EE(sender: RTCRtpSender): Promise<void> {
    // Check browser support
    if (!('transform' in RTCRtpSender.prototype)) {
      throw new Error('E2EE not supported in this browser');
    }

    // Generate encryption key
    this.encryptionKey = await crypto.subtle.generateKey({ name: 'AES-GCM', length: 256 }, true, [
      'encrypt',
      'decrypt',
    ]);

    // Set up transform stream
    const senderStreams = (sender as any).createEncodedStreams();
    const transformStream = new TransformStream({
      transform: async (frame, controller) => {
        const encryptedFrame = await this.encryptFrame(frame);
        controller.enqueue(encryptedFrame);
      },
    });

    senderStreams.readable.pipeThrough(transformStream).pipeTo(senderStreams.writable);
  }

  private async encryptFrame(frame: RTCEncodedVideoFrame): Promise<RTCEncodedVideoFrame> {
    const iv = crypto.getRandomValues(new Uint8Array(12));
    const encrypted = await crypto.subtle.encrypt(
      { name: 'AES-GCM', iv },
      this.encryptionKey!,
      frame.data
    );

    // Prepend IV to encrypted data
    const newData = new Uint8Array(iv.length + encrypted.byteLength);
    newData.set(iv);
    newData.set(new Uint8Array(encrypted), iv.length);

    frame.data = newData.buffer;
    return frame;
  }
}

Browser Support Matrix:

Browser	Insertable Streams	Status
Chrome 86+	✅	Full support
Edge 86+	✅	Full support
Firefox	❌	Not supported
Safari 15.4+	✅	Full support

Migration Triggers

Add E2EE when:

Handling sensitive/confidential content
Enterprise customers require it
Compliance requirements (HIPAA, etc.)
Users explicitly request it

Note: E2EE is a feature toggle, not a migration. You can offer both modes.

Decision 5: Role-Based Participant Model

Principle: Different roles have different capabilities and resource allocations.

Why This Matters

Not everyone needs to send high-quality video
Controllers need low latency; viewers can tolerate more
Explicit roles prevent "everyone sends everything" chaos
Enables cost optimization (viewers are cheaper than publishers)

MVP Implementation

Three Simple Roles:

Role	Capabilities	Limits
Host	Screen share, grant control	1 per session
Controller	View, request/receive control	Max 3 per session
Viewer	View only	~25 P2P, 100+ SFU

// MVP: Simple role model
type ParticipantRole = 'host' | 'controller' | 'viewer';

interface Participant {
  id: string;
  role: ParticipantRole;
  controlState: 'none' | 'requested' | 'granted';
}

// Role-based permissions
const rolePermissions: Record<ParticipantRole, Permission[]> = {
  host: ['share_screen', 'grant_control', 'revoke_control', 'end_session'],
  controller: ['view', 'request_control', 'send_input'],
  viewer: ['view'],
};

function canPerformAction(participant: Participant, action: string): boolean {
  return rolePermissions[participant.role].includes(action as Permission);
}

What you get:

Clear permission boundaries
Simple UI (role determines what you see)
Predictable resource usage

Scaled Implementation

Dynamic Roles with Quality Tiers:

// Scaled: Dynamic roles with quality allocation
interface ParticipantConfig {
  role: ParticipantRole;
  qualityTier: 'high' | 'standard' | 'low';
  maxBitrate: number;
  priority: number;
}

const roleConfigs: Record<ParticipantRole, ParticipantConfig> = {
  host: {
    role: 'host',
    qualityTier: 'high',
    maxBitrate: 4_000_000, // 4 Mbps
    priority: 1,
  },
  controller: {
    role: 'controller',
    qualityTier: 'high',
    maxBitrate: 4_000_000,
    priority: 2,
  },
  viewer: {
    role: 'viewer',
    qualityTier: 'standard',
    maxBitrate: 2_000_000, // 2 Mbps
    priority: 3,
  },
};

// Bandwidth allocation under congestion
class BandwidthAllocator {
  allocate(participants: Participant[], availableBandwidth: number): Map<string, number> {
    const allocations = new Map<string, number>();

    // Sort by priority
    const sorted = [...participants].sort(
      (a, b) => roleConfigs[a.role].priority - roleConfigs[b.role].priority
    );

    let remaining = availableBandwidth;
    for (const p of sorted) {
      const config = roleConfigs[p.role];
      const allocation = Math.min(config.maxBitrate, remaining / sorted.length);
      allocations.set(p.id, allocation);
      remaining -= allocation;
    }

    return allocations;
  }
}

Host Transfer:

// Scaled: Dynamic host transfer
async function transferHost(sessionId: string, newHostId: string): Promise<void> {
  const session = await getSession(sessionId);
  const currentHost = session.participants.find((p) => p.role === 'host');
  const newHost = session.participants.find((p) => p.id === newHostId);

  if (!newHost || newHost.role === 'viewer') {
    throw new Error('New host must be a controller');
  }

  // Update roles
  await updateParticipantRole(sessionId, currentHost.id, 'controller');
  await updateParticipantRole(sessionId, newHostId, 'host');

  // Notify all participants
  await broadcastRoleChange(sessionId, {
    previousHost: currentHost.id,
    newHost: newHostId,
  });

  // New host needs to start screen sharing
  // (handled client-side after notification)
}

Migration Triggers

Move from MVP to Scaled when:

Need dynamic host transfer
Need quality differentiation by role
Need bandwidth allocation policies
Enterprise customers need custom roles

Cost Implications

Tier 1: MVP - P2P Focus

Monthly Cost: ~$50-150

Component	Service	Cost
Control Plane	Supabase Pro	$25/month
TURN Server	DigitalOcean Droplet	$24/month
Web Hosting	Railway/Vercel	$0-20/month
Domain + SSL	Cloudflare	$10/year

What you're paying for:

Supabase handles auth, DB, and realtime signaling
TURN server for NAT traversal (only used when P2P fails)
No SFU costs - media goes directly between peers

Best for:

<1,000 monthly active users
<50 concurrent sessions
Sessions with <10 participants each

What you can skip:

SFU infrastructure
Multi-region deployment
Dedicated orchestrator

Tier 2: SFU-Enabled

Monthly Cost: ~$200-600

Component	Service	Cost
Control Plane	Supabase Pro	$25/month
SFU	LiveKit Cloud or self-hosted	$100-400/month
TURN Server	DigitalOcean Droplet	$24/month
Web Hosting	Railway	$20/month

SFU Cost Breakdown:

Option	Pricing Model	Estimated Cost
LiveKit Cloud	Per participant-minute	~$0.004/min
Self-hosted LiveKit	Server cost	$100-200/month (4 vCPU)
mediasoup (self-hosted)	Server cost	$100-200/month

What you're paying for:

SFU offloads host bandwidth
Supports larger sessions (50-100 viewers)
Better quality consistency

Best for:

1,000-10,000 monthly active users
<200 concurrent sessions
Sessions with 10-100 participants

What you can skip:

Multi-region SFUs
Bridge cascading
Dedicated orchestrator

Tier 3: Multi-Region

Monthly Cost: ~$1,000-3,000

Component	Service	Cost
Control Plane	Supabase Pro + Edge Functions	$50-100/month
SFU Pools	2-3 regions × 2 instances	$600-1,500/month
Session Focus	Railway service	$50-100/month
TURN Servers	2-3 regions	$75-150/month
Monitoring	Datadog/Grafana	$100-200/month

What you're paying for:

Low latency for global users
Automatic failover
Server-side orchestration
Professional monitoring

Best for:

10,000-100,000 monthly active users
Global user base
99.9% uptime requirements

What you can skip:

Bridge cascading (unless single sessions span regions)
E2EE (unless required)
Custom SFU modifications

Tier 4: Enterprise Scale

Monthly Cost: $5,000+

Component	Service	Cost
Control Plane	Dedicated infrastructure	$500+/month
SFU Pools	5+ regions, auto-scaling	$2,000+/month
Bridge Cascading	Cross-region mesh	$500+/month
E2EE Infrastructure	Key management	$200+/month
24/7 Monitoring	Full observability stack	$500+/month
Support	On-call engineering	Variable

What you're paying for:

Massive scale (1000+ concurrent sessions)
Enterprise SLAs
E2EE for compliance
Dedicated support

Best for:

100,000+ monthly active users
Enterprise customers
Compliance requirements (HIPAA, SOC2)

Cost Optimization Tips

Start with P2P - It's free and works for most sessions
Use SFU selectively - Only for sessions that need it
Right-size SFU instances - Don't over-provision
Monitor actual usage - Scale based on data, not fear
Consider LiveKit Cloud - Pay-per-use beats fixed costs at low scale
Self-host at scale - Fixed costs beat per-minute at high scale

Implementation Roadmap

Phase 1: MVP (Current)

Architecture:

P2P default, SFU toggle
Supabase for everything
Single TURN server

Capabilities:

1 host + up to 25 viewers (P2P)
1 host + up to 100 viewers (SFU)
Manual mode selection

Focus:

Get core functionality working
Validate product-market fit
Keep infrastructure simple

Phase 2: SFU Hardening

Trigger: Consistent demand for >10 viewer sessions

Changes:

Deploy dedicated SFU (LiveKit recommended)
Add SFU health monitoring
Implement automatic P2P → SFU suggestion

New Capabilities:

Reliable 100+ viewer sessions
Better quality consistency
Basic analytics

Phase 3: Multi-Region

Trigger: Users in multiple continents, latency complaints

Changes:

Deploy SFU in 2-3 regions
Add Session Focus service
Implement automatic region selection

New Capabilities:

<100ms latency globally
Automatic failover
Region-aware routing

Phase 4: E2EE Option

Trigger: Enterprise customers, compliance requirements

Changes:

Implement Insertable Streams encryption
Add key exchange protocol
Create E2EE toggle in UI

New Capabilities:

True end-to-end encryption
Compliance-ready
Privacy-focused option

Phase 5: Large-Scale Events

Trigger: Demand for 500+ viewer sessions

Changes:

Implement bridge cascading
Add dynamic scaling
Optimize for broadcast scenarios

New Capabilities:

1000+ viewer sessions
Cross-region single sessions
Broadcast mode

What NOT to Build Until Needed

Feature	Build When	Not Before
Multi-region SFU	Users in 2+ continents	Single region works
Session Focus service	>50 concurrent sessions	Manual works
Bridge cascading	>500 viewers per session	Single SFU works
E2EE	Enterprise/compliance demand	Transport encryption works
Custom SFU	Unique requirements	LiveKit/mediasoup work
Auto-scaling	Unpredictable load spikes	Fixed capacity works

Appendix: SFU Comparison

LiveKit

Pros:

Modern, well-documented
Cloud offering available
Good TypeScript SDK
Active development

Cons:

Newer, smaller community
Cloud pricing can add up

Best for: Teams wanting managed option with self-host path

mediasoup

Pros:

Lightweight, flexible
Large community
Battle-tested
No licensing costs

Cons:

More DIY required
Node.js only (C++ addon)
Steeper learning curve

Best for: Teams with WebRTC expertise wanting full control

Janus

Pros:

Very mature
Plugin architecture
Supports many protocols

Cons:

C codebase
Complex configuration
Heavier weight

Best for: Complex multi-protocol scenarios

Recommendation

Start with LiveKit - best balance of ease and capability. Migrate to self-hosted mediasoup if you need more control or cost optimization at scale.

FilesExpand file tree

SCALING-ARCHITECTURE.md

Latest commit

History

SCALING-ARCHITECTURE.md

File metadata and controls

PairUX Scaling Architecture

Overview

Philosophy

When to Use What

Architecture at a Glance

MVP Architecture

Scaled Architecture

Component Mapping

The 5 Critical Decisions

Decision 1: Media Plane vs Control Plane Separation

Why This Matters

MVP Implementation

Scaled Implementation

Migration Triggers

Decision 2: Session Controller / Orchestrator

Why This Matters

MVP Implementation

Scaled Implementation

Migration Triggers

Decision 3: Horizontal SFU Scaling

Why This Matters

MVP Implementation

Scaled Implementation

Migration Triggers

Decision 4: End-to-End Encryption (E2EE)

Why This Matters

MVP Implementation

Scaled Implementation

Migration Triggers

Decision 5: Role-Based Participant Model

Why This Matters

MVP Implementation

Scaled Implementation

Migration Triggers

Cost Implications

Tier 1: MVP - P2P Focus

Tier 2: SFU-Enabled

Tier 3: Multi-Region

Tier 4: Enterprise Scale

Cost Optimization Tips

Implementation Roadmap

Phase 1: MVP (Current)

Phase 2: SFU Hardening

Phase 3: Multi-Region

Phase 4: E2EE Option

Phase 5: Large-Scale Events

What NOT to Build Until Needed

Appendix: SFU Comparison

LiveKit

mediasoup

Janus

Recommendation

Related Documentation