Advanced WebRTC troubleshooting

If you're still having trouble connecting to calls over a complex network configuration, it may help to understand what's happening at a lower level so you can make appropriate configuration changes.

WebRTC calls use at least two network connections. The first is to a signaling server (usually using HTTP or WebSockets) to exchange call information. The second is a direct connection between you and the person you're talking to. All of the call media (audio and video) flows over that second connection. On peer-to-peer calls with more people, you'll have a separate, direct connection to each person for sending media. These connections usually use UDP under the hood because of its lower latency than TCP.

'Server' or SFU calls still use those same connections. But instead of you connecting directly to each other person to send media, you have a single peer-to-peer connection with a central server. That server is selectively forwarding different tracks to each participant—that's why they call it a Selective Forwarding Unit, or SFU. But the connection architecture is still the same as a P2P call.

To establish that peer-to-peer media connection, the two peers use the signaling connection to find the most direct way to talk to each other, using a process called ICE. Peers usually can't connect directly to each other by IP and port (unless they're on the same LAN), but there are two tools that can help: STUN and TURN.

STUN uses a set of publicly available servers to help peers find each other. Each peer pings a STUN server, and that server responds to the ping with a message including the public-facing address it saw the ping come from. The peers can then exchange those public-facing addresses with each other and try to open direct connections.

If STUN doesn't reveal any addresses that result in successful connections, the peers can also try TURN. TURN servers are also publicly available, but they serve as bi-directional forwarding units. If Alice and Bob can't connect directly to each other because of firewalls or NATs, they can select a specific TURN server. Then, Alice can establish a connection to that server, and any media she sends will get forwarded along to Bob, and vice versa.

TURN servers still prefer UDP connections, but if a peer is connecting from an especially restrictive network, it's also possible to send media to a TURN server using TCP. This adds a small but usually acceptable amount of latency, but sometimes it's the only way to get media to and from a peer behind a particularly stateful firewall or symmetric NAT.

The aforementioned * hostnames are our SFUs, which run our signaling channel as well as server call media connections. You'll need to be able to open a web socket connection to those hostnames in order for our signaling channel to work. Fortunately, those requests go through well-understood channels in browsers by now, and they're rarely the source of the problem.

* and * are hostnames from services that we use for STUN and TURN. Our ICE negotiation is set up to first try using STUN to establish a direct connection between peers, then try TURN over UDP, and then finally fall back to TURN over TCP/TLS on port 443. All of those rely on the network infrastructure between the peer and the internet behaving somewhat consistently.

Even if a network is very tightly controlled, as long as it's consistent, things should work (even if they take a while). Even if STUN fails to identify a direct connection, and TURN over UDP isn't possible, a TURN connection over TCP/TLS on 443 should work because it's essentially indistinguishable from normal internet traffic. But if a proxy isn't allowing connections to Twilio or Xirsys at all, for example, it won't be possible to use Daily through that proxy.