Yousef Kakhki | System Architect & Infrastructure Lead

Introduction

Building real-time collaborative applications at scale is one of the most challenging problems in modern web engineering. When we set out to create a virtual classroom platform capable of hosting thousands of concurrent users, we quickly discovered that no single technology could solve the problem elegantly. WebRTC delivers the low-latency, interactive experience users expect, but it doesn't scale beyond a few hundred participants. Traditional streaming protocols like HLS can reach millions of viewers, but the 6-10 second latency makes real-time interaction impossible.

This is the story of how we built a hybrid architecture that combines the best of both worlds—and the engineering decisions that made it possible.

The Problem: WebRTC's Scalability Ceiling

WebRTC excels at low-latency, bidirectional communication. It's the technology powering Google Meet, Zoom, and countless other video conferencing applications. However, WebRTC was designed for small-group communication, not broadcast scenarios.

A typical LiveKit SFU (Selective Forwarding Unit) can handle 200-300 participants before experiencing degradation. Beyond this point, several issues emerge:

CPU and bandwidth exhaustion on the SFU as it forwards media streams to each participant

Client-side resource constraints as browsers struggle to decode multiple video streams

Network complexity as the mesh of connections grows exponentially

Meanwhile, our virtual classroom platform needed to support 1000+ concurrent viewers while maintaining real-time interactivity for teachers and active students. A traditional lecture might have one teacher, 10-20 active participants asking questions, and hundreds of passive viewers watching the stream.

The challenge became clear: How do we combine the intimacy of WebRTC with the scalability of broadcast streaming?

The Hybrid Solution: Two-Tier Architecture

We designed a two-tier architecture that separates users by their role and interaction level:

| Tier | Technology | Latency | Capacity | Use Case |
|------|------------|---------|----------|----------|
| Interactive | LiveKit WebRTC | ~100ms | ~200 users | Teachers, active students |
| Passive | HLS via Egress | ~6-10s | Unlimited | Observers, late joiners |

The key insight is that in most educational scenarios, only a small percentage of users need bidirectional communication at any given time. The majority are passive consumers who benefit from the reliability and scalability of HLS.

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                      LiveKit SFU Cluster                    │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │   Teacher   │  │  Student A  │  │  Student B  │         │
│  │  (publish)  │  │ (pub/sub)   │  │ (pub/sub)   │         │
│  └──────┬──────┘  └──────┬──────┘  └──────┬──────┘         │
│         └────────────────┼────────────────┘                 │
│                          ▼                                  │
│                 ┌─────────────────┐                         │
│                 │  Egress Service │                         │
│                 │  (Composite)    │                         │
│                 └────────┬────────┘                         │
└──────────────────────────┼──────────────────────────────────┘
                           ▼
                  ┌─────────────────┐
                  │   HLS Origin    │
                  │   (S3/CDN)      │
                  └────────┬────────┘
                           │
           ┌───────────────┼───────────────┐
           ▼               ▼               ▼
    ┌───────────┐   ┌───────────┐   ┌───────────┐
    │ Viewer 1  │   │ Viewer 2  │   │ Viewer N  │
    │   (HLS)   │   │   (HLS)   │   │   (HLS)   │
    └───────────┘   └───────────┘   └───────────┘
         1000+ Passive Viewers (Scalable via CDN)

The ha-api Service: Orchestration Layer

The ha-api service is the brain of our hybrid architecture. Built with Node.js and Express, it handles:

Room lifecycle management - Creating rooms, starting/stopping egress

Token generation - Issuing appropriate LiveKit tokens based on user role

User promotion/demotion - Moving users between interactive and passive tiers

Signal bridging - Connecting passive viewers to the interactive layer

Here's how we initialize HLS egress when a room is created:

import { EgressClient, EncodedFileType, SegmentedFileProtocol } from 'livekit-server-sdk';
const egressClient = new EgressClient(
  process.env.LIVEKIT_URL!,
  process.env.LIVEKIT_API_KEY!,
  process.env.LIVEKIT_API_SECRET!
);async function startRoomEgress(roomName: string): Promise {
  const egress = await egressClient.startRoomCompositeEgress(
    roomName,
    {
      segmentOutputs: [{
        protocol: SegmentedFileProtocol.HLS_PROTOCOL,
        filenamePrefix: hls/${roomName}/stream,
        playlistName: 'playlist.m3u8',
        segmentDuration: 4,
        s3: {
          bucket: process.env.S3_BUCKET!,
          region: process.env.S3_REGION!,
          accessKey: process.env.S3_ACCESS_KEY!,
          secret: process.env.S3_SECRET_KEY!,
        },
      }],
    },
    {
      layout: 'speaker',
      audioOnly: false,
      videoOnly: false,
    }
  );
  
  return egress.egressId;
}

Deep Dive: NATS JetStream Signaling

The most interesting engineering challenge we faced was this: How does a passive HLS viewer (who has no WebRTC connection to the room) signal the moderator that they want to speak?

In a traditional WebRTC setup, participants use data channels or the signaling server to send messages. But our passive viewers are completely disconnected from the LiveKit infrastructure—they're just watching an HLS stream through a CDN.

We solved this with a side-channel signaling system using NATS JetStream.

Why NATS JetStream?

We evaluated several options for our signaling backbone:

| Technology | Pros | Cons |
|------------|------|------|
| Redis Pub/Sub | Simple, fast | No persistence, no replay |
| Kafka | Durable, scalable | Heavy, complex setup |
| RabbitMQ | Mature, reliable | Doesn't fit event-streaming model |
| NATS JetStream | Lightweight, persistent, exactly-once | Perfect fit |

NATS JetStream gave us the best of all worlds: the simplicity of Redis with the durability of Kafka, all in a single lightweight binary.

Signal Flow Architecture

┌──────────────────┐                              ┌──────────────────┐
│  Passive Viewer  │                              │    Moderator     │
│   (HLS Client)   │                              │ (WebRTC Client)  │
└────────┬─────────┘                              └────────┬─────────┘
         │                                                  │
         │ HTTP POST /api/signal/raise-hand                │
         ▼                                                  │
┌──────────────────┐                                       │
│     ha-api       │                                       │
│  (Express.js)    │                                       │
└────────┬─────────┘                                       │
         │                                                  │
         │ js.publish('room.{id}.signal.raise-hand')       │
         ▼                                                  │
┌──────────────────┐                                       │
│  NATS JetStream  │                                       │
│    (Stream)      │                                       │
└────────┬─────────┘                                       │
         │                                                  │
         │ Consumer subscription                           │
         ▼                                                  │
┌──────────────────┐                                       │
│     ha-api       │──────── SSE Push ─────────────────────▶
│  (SSE Endpoint)  │         'RAISE_HAND' event            │
└──────────────────┘                                       ▼
                                              ┌──────────────────┐
                                              │  Moderator sees  │
                                              │  raise-hand UI   │
                                              └──────────────────┘

Implementation Details

NATS Stream Configuration:

import { connect, JetStreamManager, RetentionPolicy, StorageType } from 'nats';async function setupNatsStreams() {
  const nc = await connect({ servers: process.env.NATS_URL });
  const jsm = await nc.jetstreamManager();
  
  // Create stream for room signals
  await jsm.streams.add({
    name: 'ROOM_SIGNALS',
    subjects: ['room..signal.'],
    retention: RetentionPolicy.Limits,
    storage: StorageType.Memory,
    max_age: 3600 * 1e9, // 1 hour in nanoseconds
    max_msgs_per_subject: 1000,
  });
  
  return nc;
}

Publishing a Raise-Hand Event:

app.post('/api/rooms/:roomId/raise-hand', authenticate, async (req, res) => {
  const { roomId } = req.params;
  const { userId, displayName } = req.user;
  
  const subject = room.${roomId}.signal.raise-hand;
  const payload = JSON.stringify({
    type: 'RAISE_HAND',
    userId,
    displayName,
    timestamp: Date.now(),
    metadata: {
      viewerType: 'passive',
      connectionId: req.headers['x-connection-id'],
    },
  });
  
  await js.publish(subject, payload, {
    msgID: raise-hand-${userId}-${Date.now()}, // Deduplication
  });
  
  res.json({ success: true, message: 'Hand raised successfully' });
});

SSE Endpoint for Moderators:

app.get('/api/rooms/:roomId/events', authenticate, requireModerator, async (req, res) => {
  const { roomId } = req.params;
  
  // Set up SSE headers
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');
  res.flushHeaders();
  
  // Subscribe to room signals
  const consumer = await js.consumers.get('ROOM_SIGNALS', moderator-${roomId});
  const messages = await consumer.consume();
  
  for await (const msg of messages) {
    const event = JSON.parse(msg.data.toString());
    res.write(event: ${event.type}\n);
    res.write(data: ${JSON.stringify(event)}\n\n);
    msg.ack();
  }
  
  req.on('close', () => {
    messages.stop();
  });
});

The Seamless Promotion Flow

The crown jewel of our architecture is the promotion flow—the ability to instantly upgrade a passive HLS viewer to an active WebRTC participant without any page reload or context loss.

The User Journey

Passive viewer watches HLS stream - Low bandwidth, high latency, but scalable

Viewer raises hand - HTTP request to ha-api → NATS → SSE to moderator

Moderator approves - Clicks "Promote" button in their UI

ha-api generates LiveKit token - With canPublish: true and canSubscribe: true

SSE pushes PROMOTE event - Contains the new token and room info

React client hot-swaps components - HLS player unmounts, LiveKit room mounts

User is now interactive - Can speak, share video, participate fully

Client-Side Implementation (React)

import { useEffect, useState, useCallback } from 'react';
import { LiveKitRoom, VideoConference } from '@livekit/components-react';
import HLSPlayer from './HLSPlayer';
type ViewerMode = 'passive' | 'interactive' | 'transitioning';
interface RoomViewerProps {
  roomId: string;
  hlsUrl: string;
  userId: string;
}
export function RoomViewer({ roomId, hlsUrl, userId }: RoomViewerProps) {
  const [viewerMode, setViewerMode] = useState('passive');
  const [livekitToken, setLivekitToken] = useState(null);
  const [isHandRaised, setIsHandRaised] = useState(false);
  
  // SSE connection for receiving events
  useEffect(() => {
    const eventSource = new EventSource(
      /api/rooms/${roomId}/user-events?userId=${userId}
    );
    
    eventSource.addEventListener('PROMOTE', (event) => {
      const data = JSON.parse(event.data);
      console.log('Promotion received!', data);
      
      setViewerMode('transitioning');
      setLivekitToken(data.token);
      
      // Small delay to ensure clean transition
      setTimeout(() => {
        setViewerMode('interactive');
        setIsHandRaised(false);
      }, 500);
    });
    
    eventSource.addEventListener('DEMOTE', (event) => {
      console.log('Demotion received');
      setViewerMode('transitioning');
      
      setTimeout(() => {
        setLivekitToken(null);
        setViewerMode('passive');
      }, 500);
    });
    
    eventSource.onerror = (error) => {
      console.error('SSE connection error:', error);
      // Implement reconnection logic
    };
    
    return () => eventSource.close();
  }, [roomId, userId]);
  
  const handleRaiseHand = useCallback(async () => {
    try {
      await fetch(/api/rooms/${roomId}/raise-hand, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
      });
      setIsHandRaised(true);
    } catch (error) {
      console.error('Failed to raise hand:', error);
    }
  }, [roomId]);
  
  // Render based on current mode
  if (viewerMode === 'transitioning') {
    return (
      
        
          
          Connecting to live session...

        

      

    );
  }
  
  if (viewerMode === 'interactive' && livekitToken) {
    return (
              token={livekitToken}
        serverUrl={process.env.NEXT_PUBLIC_LIVEKIT_URL}
        connect={true}
        audio={true}
        video={true}
      >
        
      
    );
  }
  
  // Passive mode - HLS viewer
  return (
    
      
      
      {/ Raise Hand Button /}
      
                  onClick={handleRaiseHand}
          disabled={isHandRaised}
          className={px-6 py-3 rounded-full font-medium transition-all ${
            isHandRaised
              ? 'bg-yellow-500 text-black'
              : 'bg-blue-600 hover:bg-blue-700 text-white'
          }}
        >
          {isHandRaised ? '✋ Hand Raised' : '🙋 Raise Hand'}
        
      

      
      {/ Passive mode indicator /}
      
        📺 Watching (HLS)
      

    

  );
}

Server-Side Promotion Handler

app.post('/api/rooms/:roomId/promote/:userId', authenticate, requireModerator, async (req, res) => {
  const { roomId, userId } = req.params;
  
  // Generate LiveKit token with publishing permissions
  const token = new AccessToken(
    process.env.LIVEKIT_API_KEY!,
    process.env.LIVEKIT_API_SECRET!,
    {
      identity: userId,
      ttl: '24h',
    }
  );
  
  token.addGrant({
    room: roomId,
    roomJoin: true,
    canPublish: true,
    canSubscribe: true,
    canPublishData: true,
  });
  
  const jwt = token.toJwt();
  
  // Publish promotion event via NATS
  await js.publish(room.${roomId}.user.${userId}.event, JSON.stringify({
    type: 'PROMOTE',
    token: jwt,
    roomId,
    timestamp: Date.now(),
  }));
  
  res.json({ success: true, message: 'User promoted successfully' });
});

Performance Results and Lessons Learned

After extensive load testing and production deployment, here are our results:

Metrics

| Metric | Target | Achieved |
|--------|--------|----------|
| Max concurrent users | 1,000 | 1,247 |
| Promotion latency | <5s | 2.3s avg |
| HLS stream delay | <15s | 8.2s avg |
| Infrastructure cost vs pure WebRTC | -50% | -78% |
| Client CPU usage (passive) | <10% | 4.2% |

Key Lessons

HLS segment duration matters - We settled on 4-second segments as the sweet spot between latency and reliability. Shorter segments (2s) caused buffering issues on poor connections.

NATS is incredibly lightweight - A single NATS server handles 50,000+ messages/second with minimal resource usage. JetStream's persistence adds negligible overhead.

SSE > WebSocket for simple push - For one-way server-to-client communication, SSE is simpler to implement and more reliable than WebSocket. It also works better with load balancers.

Transition UX is critical - Users initially found the 2-3 second promotion delay jarring. Adding a "transitioning" state with a loading animation dramatically improved perceived performance.

CDN configuration is tricky - HLS segments need specific cache headers. We spent days debugging issues that turned out to be CDN cache invalidation problems.

Conclusion

Building scalable real-time applications requires thinking beyond single-technology solutions. Our hybrid WebRTC-HLS architecture demonstrates that by intelligently combining technologies based on use case, we can achieve both the interactivity users expect and the scalability businesses require.

The key architectural decisions that made this possible:

Role-based tier separation - Not all users need the same level of interactivity

Side-channel signaling with NATS - Decoupling passive viewers from the WebRTC infrastructure

Seamless client-side transitions - Making the technology invisible to the end user

This approach has allowed us to scale our virtual classroom platform from hundreds to thousands of users while actually reducing infrastructure costs. The same patterns could be applied to webinars, live events, gaming spectator modes, or any scenario where you need to combine real-time interaction with broadcast scale.

---

Have questions about implementing similar architectures? We'd love to hear from you in the comments below.

Scaling Hybrid Classrooms: Reducing Costs by 78% with WebRTC & HLS