A simple WebRTC peer-to-peer bidirectional audio + video call demo with a Node.js signaling server and console clients using native C++ WebRTC bindings. Each client can independently send or receive audio and video, controlled via configuration. Video is displayed in an X11 window.
- Architecture
- Prerequisites
- Getting Started
- Running
- Client Commands
- Client Configuration
- Nanosecond Delay Measurement (PTP)
- Logging
- Example Session
- Signaling Protocol
- Testing
- License
┌──────────┐ WebSocket ┌──────────┐ WebSocket ┌──────────┐
│ Client A ├───────────────────►│ Server │◄───────────────────┤ Client B │
│ (Node.js │ JSON signaling │ (Node.js)│ JSON signaling │ (Node.js │
│ + C++) ├────────────────────┤ ├────────────────────┤ + C++) │
└────┬─────┘ └──────────┘ └────┬─────┘
│ │
│ P2P Bidirectional Audio + Video (after ICE) │
└───────────────────────────────────────────────────────────────┘
The client uses a split-compilation approach with a C ABI boundary:
webrtc_core.cc— compiled with Chromium's clang++ and libc++ to matchlibwebrtc.aABIaddon.cc/peer_connection_wrapper.cc— compiled with Chromium's clang++ using Node.js N-API headers- Communication between the two layers uses only C types (
const char*,int, opaque pointers)
- Linux (Ubuntu 22.04+ or equivalent)
- Node.js 18+ with npm
- Git
- X11 development libraries (
libx11-dev)
git clone --recurse-submodules https://github.com/morosev/mec-cast.git
cd mec-castIf you already cloned without --recurse-submodules:
git submodule update --init --recursiveThe webrtc/src submodule points to our
custom WebRTC fork (branch
mec-cast) which adds the SendTimestampNsExtension (16-byte RTP header
extension carrying capture + send timestamps), forces every frame to carry
timing metadata, and exposes encode duration on decoded frames.
cd webrtc
git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
export PATH="$PWD/depot_tools:$PATH"
cp .gclient.template .gclient
gclient syncThe .gclient.template configures gclient to use our fork. gclient sync
fetches all third-party dependencies (toolchain, codecs, etc.) required to
build WebRTC. This is a large download (~20 GB) and takes time on the first run.
cd src
sudo ./build/install-build-deps.sh --no-promptgn gen out/release_x64 --args='is_debug=false rtc_include_tests=false proprietary_codecs=true ffmpeg_branding="Chrome"'
ninja -C out/release_x64 webrtcThis produces out/release_x64/obj/libwebrtc.a.
# Server
cd ../../server
npm install
# Client
cd ../client
npm install
npx node-gyp install
chmod +x build.sh
./build.shbuild.sh uses Chromium's clang++ to compile the native addon
(build/Release/webrtc_addon.node), linking against libwebrtc.a and the
libc++ runtime objects from the WebRTC build output.
Open 3 terminal windows:
cd server
npm startThe server listens on port 8080 and logs each client connection/disconnection.
cd client
npm startcd client
npm start| Command | Description |
|---|---|
connect <name> [server] |
Connect to signaling server (default: ws://localhost:8080) |
call |
Start a call (sends audio + video from mic/camera) |
answer |
Answer an incoming call (plays audio, opens video window) |
end |
End the current call |
disconnect |
Disconnect from server |
status |
Show current connection status |
stats on|off |
Toggle live WebRTC stats display (every 2s) |
delay on|off |
Toggle nanosecond delay reporting (every 2s) |
delay report |
Show one-shot delay measurement snapshot |
delay log [file] |
Start logging delay data to CSV file |
delay reset |
Reset delay statistics |
delay ptp |
Show PTP synchronization status |
audioinfo |
Show audio track diagnostics |
videoinfo |
Show video track diagnostics |
help |
Show available commands |
quit |
Exit the client |
The client can be configured via client/client-config.json to automate
connection and call setup. All properties are optional — when omitted or set to
defaults, the client behaves as a fully manual interactive console.
{
"server_address": "ws://localhost:8080",
"username": "alice",
"auto_connect": true,
"auto_answer": true,
"auto_stats_on": false,
"send_audio": true,
"send_video": true
}| Property | Type | Default | Description |
|---|---|---|---|
server_address |
string | "ws://localhost:8080" |
WebSocket URL of the signaling server |
username |
string | "" |
Name to register with on the server |
auto_connect |
bool | false |
Connect to the server automatically on startup |
auto_answer |
bool | false |
Automatically answer incoming calls |
auto_stats_on |
bool | false |
Enable live stats display when a call connects |
send_audio |
bool | true |
Capture and send microphone audio |
send_video |
bool | true |
Capture and send camera video |
Validation rules:
auto_connectrequires bothserver_addressandusernameto be set.
When the config file is absent, the client starts in fully manual mode.
The system includes precise delay measurement using PTP (IEEE 1588) synchronized clocks. This enables nanosecond-resolution one-way delay measurement between sender and receiver in a 5G testbed environment.
The system measures seven delay components across the entire video pipeline. Each metric is computed per-frame and aggregated into running statistics (last, min, max, average, p99, count).
| Metric | Description |
|---|---|
| Network (one-way) | RTP packet transit time (requires PTP sync) |
| Glass-to-glass | Camera capture → display render (end-to-end) |
| Encoding | Time spent in the video encoder |
| Decoding | Time spent in the video decoder |
| Jitter buffer | Time waiting in the receiver jitter buffer |
| Sender pipeline | Capture → RTP send |
| Receiver pipeline | RTP receive → render |
Network (one-way): Measured as send_ns − receive_ns where send_ns is
stamped by the sender at packet departure (in rtp_sender_egress) and
receive_ns is recorded on the receiver when the RTP packet arrives. Both
timestamps use CLOCK_REALTIME, so this metric requires PTP-synchronized clocks
to produce meaningful absolute values. On loopback or same-machine tests, the
offset is near-zero and this effectively measures kernel/stack processing time.
Glass-to-glass: The total end-to-end latency from frame capture on the sender
to frame display on the receiver: render_ns − capture_ns. The capture_ns
timestamp is taken at the moment the camera driver delivers a frame to WebRTC's
video capture module; it is converted from the monotonic capture_time() to
CLOCK_REALTIME at packet egress and transmitted via the header extension. The
render_ns is taken locally when OnFrame() delivers the decoded frame to the
renderer. This metric requires PTP sync for cross-machine accuracy.
Encoding: The time the video encoder (VP8/VP9/H.264/AV1) takes to compress
a single frame. WebRTC's VideoTimingExtension carries encode_start_delta_ms
and encode_finish_delta_ms (relative to capture time) in every packet. On the
receiver, the decoder extracts these and exposes them as encode_duration_ms on
the decoded VideoFrame. Every frame is forced to be a "timing frame" so this
data is always available (not just periodic samples).
Decoding: The time the video decoder takes to decompress a frame. Measured
locally on the receiver using WebRTC's processing_time() field on the decoded
VideoFrame, which records the monotonic timestamps when the frame entered and
exited the decoder (generic_decoder.cc). Converted to CLOCK_REALTIME for
consistency with other metrics.
Jitter buffer: The time a frame spends waiting in the receiver's jitter
buffer before being released to the decoder: decode_start_ns − receive_ns.
This captures the buffering delay introduced to smooth out network jitter.
Higher values indicate more aggressive buffering or variable network conditions.
Sender pipeline: The total sender-side processing time from frame capture to
RTP packet departure: send_ns − capture_ns. This encompasses encoding,
packetization, and pacer queue delay. The pacer introduces intentional delay to
smooth burst transmissions and comply with bandwidth estimates.
Receiver pipeline: The total receiver-side processing time from packet
arrival to frame display: render_ns − receive_ns. This encompasses jitter
buffer wait, decoding, and any rendering queue delay.
The delay measurement system uses a custom RTP header extension to transport sender-side timestamps to the receiver. The extension uses the one-byte header format (RFC 8285) which supports extension elements of 1–16 bytes with IDs 1–14.
Extension URI: http://www.mec-cast.org/experiments/rtp-hdrext/send-timestamp-ns
Wire format (16 bytes):
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| capture_ns (high 32 bits, big-endian) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| capture_ns (low 32 bits, big-endian) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| send_ns (high 32 bits, big-endian) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| send_ns (low 32 bits, big-endian) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- capture_ns (bytes 0–7):
CLOCK_REALTIMEnanoseconds at frame capture. Computed at packet egress by converting the monotoniccapture_time()stored on theRtpPacketToSendto real-time using the offsetCLOCK_REALTIME − CLOCK_MONOTONICsampled at that instant. - send_ns (bytes 8–15):
CLOCK_REALTIMEnanoseconds at packet departure. Stamped inrtp_sender_egress.ccjust before the packet is handed to the transport layer.
Why 16 bytes? The one-byte RTP header extension format encodes element length
in a 4-bit field as L where actual length = L + 1, giving a maximum of
16 bytes per element. Our extension uses exactly this maximum to carry two
64-bit nanosecond timestamps. Using 64-bit nanoseconds (rather than 32-bit
milliseconds) provides sub-microsecond resolution necessary for PTP-grade
measurements and avoids wrap-around issues (a 64-bit ns counter won't wrap
for ~584 years).
RTP header integration: The extension is negotiated in SDP via the standard
a=extmap mechanism. WebRTC assigns it an ID (typically 12) during offer/answer
negotiation. In the RTP packet, it appears as:
RTP Header (12 bytes) → [CSRC if any] → Extension Header:
┌─────────────────────────────────────────┐
│ 0xBEDE (one-byte ext magic) | length=4 │ ← 4 = 16 bytes / 4 (32-bit words)
├─────────────────────────────────────────┤
│ ID(4b) | L=15(4b) | 16 bytes of data │ ← L=15 means 16 bytes
└─────────────────────────────────────────┘
The extension adds 20 bytes of overhead per RTP packet (4-byte extension header + 16 bytes payload). At 30 fps video with ~3 packets per frame, this amounts to ~1.8 KB/s of additional bandwidth — negligible compared to the video bitrate.
Direction: The extension is registered as kSendRecv so both peers in a
bidirectional call stamp their outgoing packets and extract timestamps from
incoming packets, enabling delay measurement in both directions simultaneously.
The delay measurement system relies on synchronized wall-clock time between sender and receiver. Without clock synchronization, one-way metrics (network, glass-to-glass, sender/receiver pipeline) are meaningless because timestamps from different machines cannot be compared.
How PTP is used:
- The PTP daemon (
ptp4l) synchronizes the NIC's hardware clock (PHC) to a grandmaster clock on the network via IEEE 1588 Precision Time Protocol. phc2syscontinuously disciplinesCLOCK_REALTIMEfrom the PHC, maintaining sub-microsecond alignment between the system clock and the PTP time domain.- Our code calls
clock_gettime(CLOCK_REALTIME)for all timestamps. When PTP is active, this clock is phase-locked to the grandmaster with nanosecond-level accuracy, making cross-machine timestamp comparisons valid. - The
PtpMonitormodule periodically reads the PHC offset (viaioctlon/dev/ptp0) and reports synchronization quality. The delay report shows "PTP Sync: ✓ RELIABLE" when the measured offset is below the configured threshold (default 1 µs).
Timing points in the pipeline:
Sender Receiver
────── ────────
Camera grab ──→ capture_ns (CLOCK_REALTIME)
│
▼
Encoder (VP8/H264)
│
▼
Packetizer + Pacer
│
▼
RTP Egress ───→ send_ns (CLOCK_REALTIME)
│
│ ~~~ Network ~~~
▼
RTP Arrival ──→ receive_time (CLOCK_MONOTONIC → CLOCK_REALTIME)
│
▼
Jitter Buffer (wait)
│
▼
Decoder (VP8/H264)
│
▼
OnFrame() ────→ render_ns (CLOCK_REALTIME)
When PTP hardware is not available (no /dev/ptp0), the system falls back to
CLOCK_REALTIME without hardware disciplining. This has several implications:
| Limitation | Impact |
|---|---|
| No accurate one-way delay | Network delay values are unreliable because CLOCK_REALTIME on two machines may differ by milliseconds (NTP) or more. Values may even be negative. |
| Glass-to-glass is approximate | Without PTP, the capture_ns and render_ns are on different time bases. On the same machine (loopback test), this works fine. |
| Sender/receiver pipeline split is invalid | Cross-machine sender_pipeline and receiver_pipeline values include clock offset error. |
| Local-only metrics still work | Encoding, decoding, and jitter buffer delays are measured entirely on one machine using monotonic clocks, so they remain accurate regardless of PTP. |
| Same-machine testing is valid | When both clients run on the same host, they share CLOCK_REALTIME and all metrics are accurate (the "PTP Sync: ✗ UNRELIABLE" warning can be ignored). |
Signaling-based NTP fallback: The system includes a signaling-based clock
synchronization protocol (clock_sync_request/clock_sync_response messages)
that estimates clock offset using NTP-style round-trip measurement via the
WebSocket connection. This provides ~1–5 ms accuracy for development use, but
is insufficient for production measurements where sub-microsecond precision is
required.
Recommendation: For production measurements in a 5G MEC testbed, always use PTP with hardware timestamping. For development and functional testing, the same-machine loopback setup provides accurate relative measurements without any PTP infrastructure.
For accurate one-way delay measurement, both nodes must have PTP-synchronized
clocks. The system uses the PTP Hardware Clock (PHC) exposed at /dev/ptp0.
Hardware needed:
- NIC with IEEE 1588 hardware timestamping support (e.g., Intel i210, i225, X710, or Mellanox ConnectX)
- PTP Grandmaster clock or GPS-disciplined time source on the network
- (For best results in 5G testbed) Dedicated PTP infrastructure with hardware timestamping throughout
Software setup:
-
Install
linuxptp:sudo apt install linuxptp
-
Start PTP synchronization on your NIC (e.g.,
eth0):sudo ptp4l -i eth0 -m -2
-
Discipline the system clock from the PTP Hardware Clock:
sudo phc2sys -s /dev/ptp0 -c CLOCK_REALTIME -O 0 -m
-
Verify synchronization (offset should be <1µs):
# Watch ptp4l output for "master offset" values # Watch phc2sys output for "offset" values
Expected accuracy:
| Setup | Accuracy |
|---|---|
| PTP with HW timestamping (same switch) | 10–100 ns |
| PTP with HW timestamping (multiple hops) | 100–500 ns |
| PTP with SW timestamping | 1–10 µs |
| No PTP (software NTP fallback) | 1–5 ms |
All screen output and errors are mirrored to log files, recreated on each application start:
- Client:
client/log/client.log - Server:
server/log/server.log
WebRTC native-layer logs are written separately to
client/log/<username>_webrtc.log.
Client A:
> connect alice
[12:00:01] Connected to server as "alice".
Client B:
> connect bob
[12:00:05] Connected to server as "bob".
[12:00:05] Peer joined: "alice". You can now type 'call' to start a call.
Client A sees:
[12:00:05] Peer joined: "bob". You can now type 'call' to start a call.
Client A:
> call
[12:00:10] Calling "bob"...
[12:00:10] Offer created and sent to peer.
Client B sees:
[12:00:10] Incoming call from "alice"!
[12:00:10] Type "answer" to accept the call.
> answer
[12:00:12] Answering call...
[12:00:12] Answer created and sent to peer.
[12:00:13] P2P connection established!
To end the call:
> end
[12:00:30] Call ended.
All messages are JSON over WebSocket:
| Message | Direction | Purpose |
|---|---|---|
{ type: "register", name } |
Client → Server | Register with server |
{ type: "registered", name } |
Server → Client | Registration confirmed |
{ type: "peer_joined", name } |
Server → Client | Another peer connected |
{ type: "peer_left", name } |
Server → Client | Peer disconnected |
{ type: "offer", sdp } |
Client → Server → Client | SDP offer |
{ type: "answer", sdp } |
Client → Server → Client | SDP answer |
{ type: "ice_candidate", candidate } |
Client → Server → Client | ICE candidate |
{ type: "hangup" } |
Client → Server → Client | End call |
{ type: "ping" } |
Client → Server | Heartbeat ping (every 10s) |
{ type: "pong" } |
Server → Client | Heartbeat pong response |
{ type: "clock_sync_request", t1_ns } |
Client → Server → Client | NTP-style clock sync request |
{ type: "clock_sync_response", t1_ns, t2_ns, t3_ns } |
Client → Server → Client | Clock sync response |
The client sends a ping message to the server every 10 seconds. The server
responds with a pong. If the server receives no ping from a client within
30 seconds (3 missed intervals), it considers the client dead, removes it, and
notifies the remaining peer via peer_left. If the client receives no pong
within 30 seconds, it logs a timeout notification and transitions to the
disconnected state.
Run a full end-to-end test locally (server + 2 clients + call + delay report):
./tests/e2e_local.sh [duration_seconds]Default duration is 10 seconds. The script:
- Starts the signaling server on port 8080
- Connects two clients (alice and bob)
- Alice calls bob (auto-answered)
- Waits for the specified duration while streaming
- Prints delay measurement reports
- Verifies P2P connection was established
- Cleans up all processes on exit
Prerequisites: server deps installed, client addon built, port 8080 free.
MIT License — see LICENSE.