A Short on How Zoom Works

Zoom scaled from 20 million to 300 million users virtually over night. What's incredible is from the outside they've shown little in the way of apparent growing pains, though on the inside it's a good bet a lot of craziness is going on.

Sure, Zoom has made some design decisions that made sense as a small spunky startup that don't make a lot of sense as a defacto standard, but that's to be expected. It's not a sign of bad architecture as many have suggested. It's just realistically how products evolve, especially when they must uplift over weeks, days, and even hours.

Sudden success invites scrutiny, so everyone wants to know how Zoom works. The problem is we don't know much, but we do have a few information sources:

Here's a gloss of a few of those sources:

  • There was quite a kerfuffle about Zoom's datacenter usage. The upshot is they started with their own co-lo spaces and then branched out to use multiple clouds as growth spiked. Almost textbook execution of how to handle sudden growth.
  • Most of Zoom runs on AWS, not Oracle - says AWS: the service has moved a large quantity of real-time video-conferencing traffic to AWS since the pandemic struck, and has also placed a lesser amount of capacity on the Oracle Cloud...CEO Eric Yuan clarified this further, explaining that Zoom historically handled real-time video conferencing traffic in "its own data centers"...Our real-time traffic always stayed inside our own data center[s] for our paid customers...During this pandemic crisis, every day is a new record. Our own existing data center[s] really cannot handle this traffic...This meant that AWS spun up thousands of new servers for Zoom every day...So ultimately, our own data center[s], and primarily Amazon, and also the Oracle cloud, those three together to serve all the unprecedented traffic.
  • We don't know a lot about Zoom's architecture, but this marketing video went into some detail: How Zoom's Unique Architecture Powers Your Video First UC Future.
  • Zoom sees their architecture as a competitive advantage. Everyone will be using video, so how do we scale to everyone? So Zoom started with the goal of video everywhere and let that goal shape their architecture.
  • Competitors trombone traffic through a datacenter, transcode it into a normal view for everybody else, and then send mixed video out to every individual participant. That introduces latency, uses a lot of CPU resources, and it's hard to scale and deploy new datacenters to meet increased load.
  • Zoom chose the SVC (Scalable Video Codec) codec over AVC. AVC is a protocol where you send a single stream and the single stream has a bitrate. If you want to send multiple bitrates you have to send multiple streams. This increases bandwidth utilization if you want to send multiple bitrates.
  • SVC is a single stream with multiple layers. That allows sending a 1.2 mbs stream that has every resolution and bitrate you may need to scale down to given network conditions. In the past you could only do SVC with an ASIC. Now, thanks to Moore’s law, SVC can be done in software.
  • Zoom created Multimedia Routing to solve the problems traditional vendors have with AVC. Cutting out transcoding got rid of latency and increased scale.
  • Multimedia routing takes user content into their cloud and when you as a client run into issues they switch a different video stream to you. When you want a different resolution you subscribe to a different layer of that person’s resolution.
  • Zoom does not transcode or mix anything or form any views. You are literally pulling multiple streams from multiple people directly from routing with zero processing. This is why you see such a great user switching and voice switching experience and low latency.
  • Zoom developed application layer QoS (Quality of Service) that works between the cloud and the client. Its job is to detect network conditions. Gathered telemetry data determines which stream is switched to a client. The algorithm looks at CPU, jitter, packet loss, etc.
  • The client talks to the cloud. The cloud knows when it doesn't get certain packets back, so it will make decisions and switch a different stream down to you.
  • The client can automatically downsize your own send video if there's a bad network environment, so you're not killing your own downstream bandwidth.
  • The client and the cloud work in tandem to deliver the right audio stream, the right video stream, across the right network, so the user experience is as good as it can be.
  • Being network aware means trying for the best experience first, which is UDP. If UDP doesn’t work it tries HTTPS. If HTTPS doesn’t work it falls back to HTTP. The client negotiates that. Telemetry shows why the connection was bad. The worst thing you can do is give the user an inconsistent experience.
  • The focus is making everything just work as simply and intuitively as possible. This point was emphasized and repeated, which may explain some of earlier design decisions.
  • At this point the talk went in a more marketing focussed direction.
  • Zoom disrupted the market with 40 minute meetings with video and chat. They added free dial-in conferencing. They deliver the best VOIP experience in the market. Competitors average VOIP adoption is less than 30%, Zoom is 89%. $3 billion a year is spent on audio conferencing and Zoom gives it away for free. Delivered software based video conferencing room experience. Delivered one button push for competitors. Gave away digital signage and room displays.
  • Zoom's competitors are sunk in revenue models they can't get out of. The can't innovate because they'll disrupt their own revenue model.
  • Zoom disrupted the meetings market, disrupted the audio market, disrupted the rooms market, and now they want to disrupt telephony. Though this was 2019, now with the pandemic that strategy may be being revisited.
  • Zooms' goal is to create the largest network of connected collaboration. They want to deliver on the promise of VOIP from twenty years ago, tearing down every pay wall for people to collaborate with each other, rolling out PSTN connectivity, connecting everyone through chat meeting phone all across IP at lowest rate on any network.