article thumbnail

Building Netflix’s Distributed Tracing Infrastructure

The Netflix TechBlog

If we had an ID for each streaming session then distributed tracing could easily reconstruct session failure by providing service topology, retry and error tags, and latency measurements for all service calls. Using simple lookup indices in Cassandra gives us the ability to maintain acceptable read latencies while doing heavy writes.

article thumbnail

Percentiles don’t work: Analyzing the distribution of response times for web services

Adrian Cockcroft

There is no way to model how much more traffic you can send to that system before it exceeds it’s SLA. Bill Kaiser of NewRelic published this blog in 2017 which goes some way towards what I’m talking about, but since then I have figured out a new way to interpret the data. Mu is the mean of each component, the latency.

Lambda 98
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Understanding the Importance of 5 Nines Availability

IO River

Delta Air Lines experienced a severe system outage in 2017, resulting in flight cancellations and delays across their network. The stakes are even higher during high-traffic periods such as Black Friday or Cyber Monday. High availability is a business imperative in this sector.

article thumbnail

Understanding the Importance of 5 Nines Availability

IO River

Delta Air Lines experienced a severe system outage in 2017, resulting in flight cancellations and delays across their network. The stakes are even higher during high-traffic periods such as Black Friday or Cyber Monday. High availability is a business imperative in this sector.

article thumbnail

Ciao Milano! – An AWS Region is coming to Italy!

All Things Distributed

Since then, AWS has added another PoP in Palermo in 2017. The website went online in less than one month and was able to support a 250 percent increase in traffic around the launch of the Aventador J. To meet such large traffic numbers, they need a technology infrastructure that is secure, reliable, and flexible.

AWS 167
article thumbnail

The Speed of Time

Brendan Gregg

A Cassandra database cluster had switched to Ubuntu and noticed write latency increased by over 30%. Since instances of both CentOS and Ubuntu were running in parallel, I could collect flame graphs at the same time (same time-of-day traffic mix) and compare them side by side.

Speed 126
article thumbnail

Solaris to Linux Migration 2017

Brendan Gregg

Here's some output from my zfsdist tool, in bcc/BPF, which measures ZFS latency as a histogram on Linux: # zfsdist. Tracing ZFS operation latency. Many new tools can now be written, and the main toolkit we're working on is [bcc]. Hit Ctrl-C to end. ^C The OS is becoming a forgotten cog in a much larger cloud-based system.