Experiments with WebSocket Performance
by Mark Logan
12 June 2012
Networking is one of the biggest obstacles facing HTML5 game developers. While WebSockets provide a TCP-like communication mechanism, game networking often relies on UDP, and there’s no way to do UDP-like communication in the browser without a plug-in.
Why do games often rely on UDP? Imagine you need your server to send a small message to a player every 100ms. In a perfect world there’s relatively little difference between using UDP and TCP for such a task. You send the player a message, and some amount of time later, the player gets it.
Of course, we don’t live in a perfect world, and in our imperfect world packets occasionally get dropped. If you’re using UDP, and a message gets dropped, the player only has to wait an additional 100ms to get the next message (assuming that it isn’t dropped as well). That is, new messages are sent out every 100ms, and the loss of one message can’t delay the arrival of the next.
With TCP, the player’s computer will receive the packet containing the next message 100ms after the missing message, but the operating system won’t send that data to the game program, because it will have detected a gap in the TCP transmission. That gap needs to be filled in before the game program gets to see any new data. How does it get filled in? The sender will wait a certain amount of time (see Computing TCP’s Retransmission Timer for more information) for the receiver to send an acknowledgement of the missing packet. After that time has elapsed and no acknowledgement has been received, the server will retransmit the packet that was dropped. Depending on how long the retransmission timeout is, this can add up to a sizeable delay, which in turn can cause a noticeable blip in the responsiveness of your game. Worse still, subsequent messages can’t be received by the game code until after this retransmission has happened, so one dropped packet can slow down several others.
But what’s the real effect of all this in practice?
In the tests below, I used
ipfw (Mac OS X’s firewall/router tool) to model different amounts of latency and packet loss, and took 250 samples.
What should we expect to see when we run this experiment? Most of the measurements will be clustered around a single value, specifically the round-trip-time between the client and the server. But if any packets get dropped during our test, we’ll see a few messages that take longer to return.
So the two parameters most significant to our results are the baseline round-trip-time, and the packet loss rate.
Before I show you all the data I gathered, go ahead and run these measurements from your own machine: (Note: you’ll need to be using Chrome or Firefox for this to work. If you don’t have a recent browser, scroll down to see the measurements I’ve already taken for you.)
Hopefully, your connection doesn’t have any packet loss right now, and so you’ll just see one bar in the above histogram. But we’d like to see the impact of varying rates of packet loss, so we’ll have to somehow induce the packet loss ourselves.
On OS X, it’s easy to model latency and packet loss rate with the
ipfw tool. First, I’ve simulated some different packet loss rates on a low latency connection. First, I ran these commands (as root):
$ ipfw add pipe 1 ip from any to any out $ ipfw add pipe 2 ip from any to any in $ ipfw pipe 1 config delay 12ms $ ipfw pipe 2 config delay 12ms
This will result in a round trip time of about 50ms. (The packet is delayed by 12 ms twice in each direction, for a total of 48ms, which is pretty close to 50.) After I ran these commands, I measured the message latency at a variety of different packet loss rates, and made histograms from the results.
50ms Latency, 0.1% Packet Loss
Now, let’s add some packet loss.
$ ipfw pipe 2 config delay 12ms plr 0.0005
This will mean about 0.1% of messages will get dropped. It’s 0.1% because they have a 0.05% chance of getting dropped on the way out and 0.05% on the way back. (Yes, I know it’s actually
100 × (1 - .9995²) percent. 0.1% is close enough.)
50ms Latency, 1% Packet Loss
Now we bump it up a little further.
$ ipfw pipe 2 config delay 12ms plr 0.005
50ms Latency, 5% Packet Loss
$ ipfw pipe 2 config delay 12ms plr 0.025
I ran the same experiments again, with 100ms of latency.
$ ipfw pipe 1 config delay 50ms $ ipfw pipe 2 config delay 50ms
100ms Latency, 0.1% Packet Loss
$ ipfw pipe 2 config delay 25ms plr 0.0005
100ms Latency, 1% Packet Loss
$ ipfw pipe 2 config delay 25ms plr 0.005
100ms Latency, 5% Packet Loss
$ ipfw pipe 2 config delay 25ms plr 0.025
The most useful plots to look at above are probably the ones for 0.1% packet loss - a connection with 1% packet loss is already verging on unusable for most people. Even at that low rate, we see that about 3% of messages arrive late.
Look at the percentage of messages that arrive “on time” in each plot. It’s far lower than you might naively expect based on the packet loss rates. For instance, at 1% packet loss, only about 95% of messages arrive on time. This is because one dropped packet delays not only the message that was in that packet, but also several subsequent messages. This is exactly why TCP isn’t the right choice for certain types of multiplayer games.
We’ll be working hard to make multiplayer html5 games a reality, despite these obstacles. Stay tuned!